Substructure searching¶

Introduction¶

In order to be able to set up a substructure search we will need to import the ccdc.search module. Let us also import the ccdc.io module to allow us to read in and write out molecules.

>>> import ccdc.search
>>> import ccdc.io

As a preamble let us set up a variable for a temporary directory.

>>> from ccdc.utilities import _test_output_dir
>>> tempdir = _test_output_dir()

Let us also get a testosterone molecule out of the CSD.

>>> entry_reader = ccdc.io.EntryReader('CSD')
>>> teston10 = entry_reader.molecule('TESTON10')
>>> testosterone = teston10.components[0]

Setting up a substructure¶

There are several ways to set up a ccdc.search.QuerySubstructure.

It can be created from a molecule.

>>> testosterone_substructure = ccdc.search.MoleculeSubstructure(testosterone)

It can also be created from a SMARTS string.

>>> hydroxyl_substructure = ccdc.search.SMARTSSubstructure("OH")
>>> ketone_substructure = ccdc.search.SMARTSSubstructure("[CD4][CD3](=[OD1])[CD4]")

There is a small extension to Daylight SMARTS to allow quadruple, delocalised and pi bonds to be represented, using the characters ‘_’, ‘”’ and ‘|’ respectively.

A substructure can also be read in from a ConQuest Connser file.

>>> filepath = 'monochloropyridine.con'

To achieve this we make use of the ccdc.search.ConnserSubstructure class.

>>> connser_substructure = ccdc.search.ConnserSubstructure(filepath)

Setting up and running a substructure search¶

The substructure instances created earlier can be used to set up substructure searches. Substructure searches can be used to search many different objects. By default a substructure search will search the CSD.

>>> substructure_search = ccdc.search.SubstructureSearch()
>>> sub_id = substructure_search.add_substructure(connser_substructure)
>>> hits = substructure_search.search()
>>> print(len(hits))  
942

Note that there may be multiple hits in one structure. The multiplicity of hits can be controlled by optional arguments to the search method: max_hit_structures controlling how many structures are returned, and max_hits_per_structure. These control how many matches in a structure should be returned. By default these parameters are set to 100000 and 5, respectively. A value of None represents an unlimited setting.

>>> testosterone_search = ccdc.search.SubstructureSearch()
>>> sub_id = testosterone_search.add_substructure(testosterone_substructure)
>>> hits = testosterone_search.search()
>>> len(hits)
12
>>> len(testosterone_search.search(max_hit_structures=4))
4
>>> len(testosterone_search.search(max_hit_structures=4, max_hits_per_structure=1))
4
>>> unique_hits = testosterone_search.search(max_hits_per_structure=1)
>>> len(unique_hits)
10

It is also possible to search the molecules in a (multi) molecule file.

>>> file_path = 'testosterone_hits.mol2'

This can be achieved using a ccdc.io.MoleculeReader instance.

>>> hydroxyl_search = ccdc.search.SubstructureSearch()
>>> sub_id = hydroxyl_search.add_substructure(hydroxyl_substructure)
>>> hydroxyl_hits = hydroxyl_search.search(ccdc.io.MoleculeReader(file_path))
>>> len(hydroxyl_hits)
20

Or, for convenience, by supplying the file path directly to the search function.

>>> hydroxyl_hits = hydroxyl_search.search(file_path)
>>> len(hydroxyl_hits)
26

The difference between the numbers of hits found in the preceding two searches is because of the difference between searching an explicit ccdc.io.MoleculeReader, which will not find hits containing suppressed atoms, and searching an implicitly defined ccdc.io.EntryReader, which will.

It is also possible to search an individual molecule.

>>> print(len( hydroxyl_search.search(testosterone) ))
1

Or a list of identifiers from the CSD:

>>> print(len( hydroxyl_search.search(['ABEBUF', 'AABHTZ', 'WOSKOE']) ))
3

Substructure search hits¶

Let us print out the hit identifiers of the original 10 hits identified by the testosterone_search search on the CSD.

>>> for hit in hits:
...     print(hit.identifier)
EFEJAD
EPITES
HOGRUT
ISTEST
ISTEST
TESBRP
TESTHG
TESTOM
TESTOM01
TESTON10
TESTON10
WOSKOE

These can be superimposed using the matched atoms of the hits.

>>> mols = hits.superimpose()
>>> print(len(mols))
12

Warning

Notice that we can only superimpose the structures where all atoms have coordinates.

Let us write out the molecules from these hits as a multi-mol2 file.

>>> output_file = os.path.join(tempdir, 'testosterone_hits.mol2')
>>> with ccdc.io.MoleculeWriter(output_file) as writer:
...     for m in mols:
...         writer.write(m)

We can also write the result data into a ConQuest to Mercury interchange file, so that the results, with constraints and measurements may be analysed in the data analysis module of Mercury.

>>> hits.write_c2m_file(os.path.join(tempdir, 'testosterone_hits.c2m'))

We can find the matched atoms for each hit, where the hit atoms have coordinates:

>>> print(hits[1].match_atoms())  
[Atom(C1), Atom(C2), Atom(C3), ...]

All forms of search hit support molecule, crystal and entry properties:

>>> entry = hits[1].entry
>>> print(entry.chemical_name == u'17\u03b1-Hydroxyandrost-4-en-3-one')
True

>>> crystal = hits[1].crystal
>>> print(crystal.spacegroup_symbol)
P212121

>>> molecule = hits[1].molecule
>>> print(len(molecule.atoms))
49

Note

The chemical name from the entry ccdc.entry.Entry.chemical_name is in Unicode format and \u03b1 is the encoding for the ‘GREEK SMALL LETTER ALPHA’ www.fileformat.info/info/unicode/char/3b1/index.htm. For more information on how to work with Unicode in Python please see docs.python.org/3.7/howto/unicode.html

Create a substructure from scratch¶

To create a substructure from scracth we will use a ccdc.search.QuerySubstructure populated with instances of ccdc.search.QueryAtom and ccdc.search.QueryBond.

Query Atoms¶

A ccdc.search.QueryAtom may be made to represent a single element, any of a set of elements or a wild card matching any element:

>>> c1 = ccdc.search.QueryAtom('C')
>>> n_or_o = ccdc.search.QueryAtom(['N', 'O'])
>>> any_atom = ccdc.search.QueryAtom()

Atom constraints¶

Constraints may be put on ccdc.search.QueryAtom to ensure that certain properties of a matched atom are fulfilled. The complete list of constraints are:

acceptor: that the atom be a hydrogen bond acceptor
donor: that the atom be a hydrogen bond donor
aromatic: that the atom be in an aromatic ring
cyclic: that the atom be in a ring
formal_charge: that the atom have a certain formal charge
formal_valency: that the atom have a certain formal valency
cyclic_bonds: that the atom have a certain number of cyclic bonds
smallest_ring: that the smallest ring in which the atom lies is of a certain size
num_bonds: that the atom have a certain number of neighbours
num_hydrogens: that the atom have a certain number of hydrogens bonded to it
nimplicit_hydrogens: that the atom have a certain number of implicit hydrogens bonded to it
has_3d_coordinates: that the atom have 3D coordinates
unfused_unbridged_ring: that the atom be or not be in an unfused/unbridged ring

Additionally there is a method ccdc.search.QueryAtom.add_connected_element_count() which may be used to specify counts of neighbouring atoms of certain element types. For use when searching protein structures there is a method ccdc.search.QueryAtom.add_protein_atom_type_constraint(). This takes any number of string arguments drawn from ‘AMINO_ACID’, ‘LIGAND’, ‘WATER’, ‘METAL’, ‘UNKNOWN’, and can be used to constrain an atom match to one of these classes. When searching non-protein structures all atoms will be of ‘UNKNOWN’ type.

Constraint conditions¶

Those constraints which have a boolean value may be assigned to with either True, False or None: if None any constraint of that type will be removed; if True the constraint will be matched only if the atom matches the constraint; if False the constraint will be matched only if the atom does not match the constraint.

Those constraints which take a numeric value may be assigned one of the following:

a single numeric value: the constraint will match only if the value of the constraint equals the value
a pair of numeric values: the constraint will match if the value of the constraint is in the inclusive range of the values
a pair of a string representation of an operator and a numeric value
- (‘==’, value)
- (‘>’, value)
- (‘<’, value)
- (‘<=’, value)
- (‘>=’, value)
- (‘!=’, value)
- (‘in’, list_of_values)

So, for example, we can add constraints such as:

>>> a = ccdc.search.QueryAtom('N')
>>> a.formal_charge = ('in', [1, 2])
>>> a.num_bonds = ('>', 2)
>>> print(a)
QueryAtom(N)[charge: one of 1, 2, number of connected atoms: greater than 2]

Query Bonds¶

A ccdc.search.QueryBond represents a bond which must be matched in a substructure search. It may represent one or more specific bond types, or any bond type. It may have constraints applied to it, too. The constraints applicable to a ccdc.search.QueryBond are:

cyclic
bond_length
bond_polymeric
bond_smallest_ring
bond_unfused_unbridged_ring

The bond length constraint may take any of the forms given above; the rest are boolean valued.

Query Substructure¶

Instances of ccdc.search.QueryAtom and ccdc.search.QueryBond may be added to a ccdc.search.QuerySubstructure to form the subject of a substructure search. The following shows a query which may be used to search for abnormally short C=O bond lengths:

>>> c = ccdc.search.QueryAtom('C')
>>> o = ccdc.search.QueryAtom('O')
>>> b = ccdc.search.QueryBond('Double')
>>> b.bond_length = ('<', 0.94)
>>> short_CO_query = ccdc.search.QuerySubstructure()
>>> _ = short_CO_query.add_atom(c)
>>> _ = short_CO_query.add_atom(o)
>>> _ = short_CO_query.add_bond(b, c, o)

And this can be used to search for carboxylic acids:

>>> cooh_substructure = ccdc.search.QuerySubstructure()
>>> c = cooh_substructure.add_atom('C')
>>> o1 = cooh_substructure.add_atom('O')
>>> o2 = cooh_substructure.add_atom('O')
>>> b1 = cooh_substructure.add_bond('Double', c, o1)
>>> b2 = cooh_substructure.add_bond('Single', c, o2)
>>> o1.num_hydrogens = 0
>>> o2.num_hydrogens = 1

With any of these substructures, a test may be made that a particular atom matches an atom of the substructure, within the context of the whole substructure. This is equivalent to having a constraint on a ccdc.search.QueryAtom of the entire substructure. For example, using the ketone_substructure defined above, and the molecule ABINUU,

>>> abinuu = entry_reader.molecule('ABINUU')
>>> print(ketone_substructure.match_atom(abinuu.atom('C8')))
True
>>> print(ketone_substructure.match_atom(abinuu.atom('C15')))
False

Note that by default this match will use the first atom of the substructure, which will be the faster case. If required an arbitrary atom of the query substructure may be used:

>>> print(ketone_substructure.match_atom(abinuu.atom('C15'), ketone_substructure.atoms[1]))
True

Substructure searching with geometric measurements¶

It is possible to define geometric objects which may be used as part of a geometric measurement, a constraint or another geometric object. These are defined by:

ccdc.search.SubstructureSearch.add_centroid(): the centroid of a set of at least two points
ccdc.search.SubstructureSearch.add_dummy_point(): a point projected along a vector between two points
ccdc.search.SubstructureSearch.add_group(): a set of atoms
ccdc.search.SubstructureSearch.add_vector(): a vector subtended by two points
ccdc.search.SubstructureSearch.add_plane(): a plane defined by a set of at least three points

Each of these methods takes a name, by which the geometric object is known and a sequence of point definitions. The dummy point definition takes an additional argument, before the two point definitions defining the distance along the vector to place the point.

A point can be defined as either a pair, (substructure_index, atom_index), or by a named centroid or dummy point. For example:

>>> guanidino = ccdc.search.SMARTSSubstructure('[NHX3][CH0X3](=[NH2X3+])[NH2X3]')
>>> carboxylate = ccdc.search.SMARTSSubstructure('[C][C]([O])[O]')
>>> searcher = ccdc.search.SubstructureSearch()
>>> sub1 = searcher.add_substructure(guanidino)
>>> sub2 = searcher.add_substructure(carboxylate)
>>> searcher.add_plane('PLANE1', (sub1, 1), (sub1, 2), (sub1, 3))
>>> searcher.add_plane('PLANE2', (sub2, 0), (sub2, 1), (sub2, 2))
>>> searcher.add_centroid('CENT1', (sub1, 2), (sub1, 3))
>>> searcher.add_centroid('CENT2', (sub2, 2), (sub2, 3))

It is possible to add geometric measurements to ccdc.search.SubstructureSearch searches.

The geometric measurements are added to the ccdc.search.SubstructureSearch using the functions:

>>> searcher.add_distance_measurement('DIST1', 'CENT1', 'CENT2')

Each of the measurement methods above (except for the constant value measurement) has a similarly named constraint method, so for example one can require that the angle between the planes defined above should be between 0 and 20 degrees:

>>> searcher.add_plane_angle_constraint('P1_P2', 'PLANE1', 'PLANE2', (0, 20))

Then we can find some hits in the CSD.

>>> hits = searcher.search(max_hit_structures=5, max_hits_per_structure=1)

With these hits we can inspect the measurements and constraints:

>>> for h in hits:
...     print('DIST1: %.2f' % h.measurements['DIST1'])
...     print('P1_P2: %.2f' % h.constraints['P1_P2'])
DIST1: 6.47
P1_P2: 7.95

One can also determine how the constraints and measurements were defined:

>>> print('%s - %s' % hits[0].measurement_objects('DIST1'))
CENT1 - CENT2
>>> print(hits[0].centroid_objects('CENT1'))
(Atom(N5), Atom(N6))

and similarly for constraints.

All these measurement functions require a name by which the measurement will be accessed. The geometric measurements require an appropriate set of point definitions as defined above for geometric objects. The atom property measurement allows aspects of a matched atom to be measured. These are:

AtomicNumber
VdwRadius
CovalentRadius
TotalCoordinationNumber

A constant value measurement allows constants to be part of a calculated measurement, The unary and binary transform measurements allow arithmetic to be performed on measured values. Unary operators are:

ABS
LOG
LOG10
EXP
COS
SIN
TAN
ACOS
ASIN
ATAN
FLOOR
ROUND
SQRT
NEG

Binary operators are:

MIN
MAX
ADD
SUBTRACT
MULTIPLY
DIVIDE
POW
RSIN
RCOS

Arguments to the arithmetic operators are the name of a measurement or the name of a constraint.

For each of the measurements above (except the constant value) there is a corresponding constraint which may be applied to the substructure matches.

Say, for example, that we were interested in understanding the intra-molecular geometry of an aromatic methoxy group. In particular how the preference of the methoxy group to lie in the plane of the aromatic ring affects the Ph-C-O angle.

Let us first create the substructure of interest using a ccdc.search.SMARTSSubstructure. The substructure can then be used to to set up a ccdc.search.SubstructureSearch.

>>> ar_methoxy_sub = ccdc.search.SMARTSSubstructure('[CH3:1][O:2][c:3]1[cH:4]ccc[cH:5]1')
>>> ar_methoxy_search = ccdc.search.SubstructureSearch()
>>> ar_methoxy_sub_id = ar_methoxy_search.add_substructure(ar_methoxy_sub)

We can now add the measurements of interest using the indices of the atoms of interest in the SMARTS pattern.

Aromatic methoxy query. — Figure illustrating the aromatic methoxy query.¶

>>> ar_methoxy_search.add_angle_measurement('ANG1',
...     ar_methoxy_sub_id, ar_methoxy_sub.label_to_atom_index(2),
...     ar_methoxy_sub_id, ar_methoxy_sub.label_to_atom_index(3),
...     ar_methoxy_sub_id, ar_methoxy_sub.label_to_atom_index(4))
>>> ar_methoxy_search.add_angle_measurement('ANG2',
...     ar_methoxy_sub_id, ar_methoxy_sub.label_to_atom_index(2),
...     ar_methoxy_sub_id, ar_methoxy_sub.label_to_atom_index(3),
...     ar_methoxy_sub_id, ar_methoxy_sub.label_to_atom_index(5))
>>> ar_methoxy_search.add_torsion_angle_measurement('TOR1',
...     ar_methoxy_sub_id, ar_methoxy_sub.label_to_atom_index(1),
...     ar_methoxy_sub_id, ar_methoxy_sub.label_to_atom_index(2),
...     ar_methoxy_sub_id, ar_methoxy_sub.label_to_atom_index(3),
...     ar_methoxy_sub_id, ar_methoxy_sub.label_to_atom_index(4))

Note

Here we are making use of the ccdc.search.SMARTSSubstructure.label_to_atom_index() function to convert the reaction SMARTS labels into zero based substructure atom indices.

>>> ar_methoxy_sub.label_to_atom_index(1)
0

Now we are ready to carry out the search. Because this aromatic methoxy substructure is quite common in the CSD, we will limit our maximum number of hits to 200. Further, to avoid bias by picking multiple observations from the same structure we will limit the number of hits per structure to 1.

>>> ar_methoxy_hits = ar_methoxy_search.search(max_hit_structures=200, max_hits_per_structure=1)
>>> len(ar_methoxy_hits)
200

To get the data out of the list of ccdc.search.SubstructureHit instances we can make use of list comprehension and Python’s built in zip functionality.

>>> measurements = [ (h.measurements['ANG1'],
...                   h.measurements['ANG2'],
...                   abs(h.measurements['TOR1']))
...                 for h in ar_methoxy_hits ]
>>> ang1, ang2, abstor1 = zip(*measurements)

Substructure searching with geometric constraints¶

It is also possible to add geometric constraints to substructure searches. These can be added using:

Suppose, for example, that we wanted to understand the interaction geometry of an aromatic iodine and the nitrogen atom of a pyridine ring. Specifically, does the C-I···N angle tend towards zero as the I···N distance becomes shorter?

First of all let us define the substructures to search for.

>>> ar_I_sub = ccdc.search.SMARTSSubstructure('Ic1ccccc1')  # I: index 0
>>> pyridine_sub = ccdc.search.SMARTSSubstructure('n1ccccc1')  # n: index 0

Using these substructures we can set up the search with a distance constraint between the iodine and the pyridine nitrogen atoms.

Aromatic iodine pyridine query. — Figure illustrating the aromatic iodine ··· pyridine query.¶

>>> halogen_bond_search = ccdc.search.SubstructureSearch()
>>> ar_I_sub_id = halogen_bond_search.add_substructure(ar_I_sub)
>>> pyridine_sub_id = halogen_bond_search.add_substructure(pyridine_sub)

Note that at this point we have added two substructures to the search and we can add a distance constraint between them. This requires us to specify both the substructure and atom identifiers of interest. Incidentally this is why the ccdc.search.SubstructureSearch.add_substructure() function returns the substructure identifier.

>>> halogen_bond_search.add_distance_constraint('DIST1',
...     ar_I_sub_id, 0,
...     pyridine_sub_id, 0,
...     (0.0, 3.4),  # distance constraint range
...     'Intermolecular')

Note

We could have specified the distance with respect to the van der Waals radii of the atoms using the by setting the vdw_corrected parameter of the ccdc.search.SubstructureSearch.add_distance_constraint() function to True.

Rather than just measure the C-I···N angle we can add it as a angular constraint, ensuring that it be greater than 120° for a match.

>>> halogen_bond_search.add_angle_constraint('ANG1',
...     ar_I_sub_id, 1,  # the carbon that the I is attached to
...     ar_I_sub_id, 0,
...     pyridine_sub_id, 0,
...     (120.0, 180.0))  # the angle constraint range

We can now carry out the search.

>>> halogen_bond_hits = halogen_bond_search.search(max_hits_per_structure=1)
>>> len(halogen_bond_hits)  
197

To get the data out of the list of ccdc.search.SubstructureHit instances we can make use of list comprehension and Python’s built in zip functionality.

>>> dist1_ang1 = [(h.constraints['DIST1'], h.constraints['ANG1'])
...               for h in halogen_bond_hits ]
>>> dist1, ang1 = zip(*dist1_ang1)

We can use matplotlib to check for any geometric preferences of the aromatic iodine ··· pyridine halogen bond.

>>> plt.clf()  # Clear the figure from the previous plot 
>>> plt.scatter(dist1, ang1)  
<matplotlib.collections.PathCollection object at ...>
>>> plt.title('Aromatic iodine - pyridine halogen bond geometry')  
<matplotlib.text.Text object at ...>
>>> plt.xlabel('DIST1')  
<matplotlib.text.Text object at ...>
>>> plt.ylabel('ANG1')  
<matplotlib.text.Text object at ...>
>>> plt.show() 

Halogen bond distance against angle. — Plotting the angle versus the distance reveals that there a weak negative correlation, *i.e.*, as the contact distance becomes shorter the angle tends towards 180°.¶

If we would like to perform further geometrical calculations on the hits of a substructure search, we can get access to the matched substructures. These will be instances of ccdc.molecule.Molecule, one for each substructure of the query, containing the atoms matched by the query and the bonds between them. The structures will take account of any symmetry operators involved in the construction of the match.

>>> h = halogen_bond_hits[0]
>>> print(h.match_atoms()) 
[Atom(I1), Atom(C12), ... Atom(C5), Atom(C4)]
>>> print(', '.join("'%s'" % symmop for symmop in h.match_symmetry_operators())) 
'x,y,z', 'x,y,z', ... '1/2-x,-y,-1/2+z', '1/2-x,-y,-1/2+z'
>>> sub_matches = h.match_substructures()
>>> print(sub_matches[0].atoms) 
[Atom(I1), Atom(C12), ..., Atom(C8), Atom(C13)]
>>> print(sub_matches[1].atoms) 
[Atom(N1), Atom(C3), ..., Atom(C5), Atom(C4)]

Both ccdc.search.SubstructureSearch.SearchHit.match_atoms() and ccdc.search.SubstructureSearch.SearchHit.match_substructures() will return atoms in the order in which they were specified by the substructures added to the SubstructureSearch, in the former case as a single tuple of atoms, in the latter separated into molecules corresponding to the substructures.

The difference between ccdc.search.SubstructureSearch.SearchHit.match_atoms() and ccdc.search.SubstructureSearch.SearchHit.match_substructures() is that in the former case, the atoms may all be found in ccdc.search.SubstructureSearch.SearchHit.molecule, and so will not have any symmetry operators applied to them. This is useful to be able to determine which atoms have participated in the match. In some case where a symmetry operator has had to be applied an atom can appear more than once in the returned tuple of atom. In the latter case separate molecules are returned, hence the atoms are not directly comparable. Symmetry operators are applied, so the atoms are in exactly the same positions as were found by the search. Hence, geometrical operations on the substructures are consonant with the measurements and constraints of the search.

Interactive Hit Processing¶

There is a new mechanism which allows the hits of a substructure search to be processed as they are found by the underlying ccdc.search.Search.search() method instead of collecting all the hits, then processing them later. This may reduce the memory requirements of the search if the interesting aspects of the hit can be extracted without storing the whole hit. In other circumstances it may provide a mechanism for early termination of the search if hits with the right data may be determined before the search has processed the entire database.

Firstly let us import the right modules and set up a search for carboxylate-carboxylate contacts:

>>> import time
>>> from ccdc.search import SMARTSSubstructure, SubstructureSearch
>>> searcher = SubstructureSearch()
>>> searcher.add_substructure(SMARTSSubstructure('C(=O)OH'))
0
>>> searcher.add_substructure(SMARTSSubstructure('C(=O)OH'))
1
>>> searcher.add_distance_constraint('DIST1', 0, 1, 1, 3, (0, 2.0), 'Intermolecular')
>>> searcher.add_distance_constraint('DIST2', 0, 3, 1, 1, (0, 2.0), 'Intermolecular')
>>> searcher.add_angle_measurement('ANG1', 0, 1, 1, 3, 1, 2)
>>> searcher.add_angle_measurement('ANG2', 0, 2, 0, 3, 1, 1)

Normally we would call ccdc.search.Search.search(), wait until all hits had been found, then process them to extract the information we sought. Now we can process the hits as they are found, allowing much greater memory efficiency in cases where we do not need all of the hit data.

Let us set up a processor which will record the closest contacts found, by summing the distance constraints. The two methods we will need to override are __init__, to provide space to record data, and add_hit(), to process the hits as they are found. For this example, I will also terminate after a few seconds for brevity.

>>> class ClosestContacts(SubstructureSearch.HitProcessor):
...     '''Record the closest 5 contacts, and terminate after a short time.'''
...     def __init__(self, max_hits=5, timeout=20.):
...         self.max_hits = max_hits
...         self.hits = []
...         self.refcode = None
...         self.start_time = time.time()
...         self.timeout = timeout
...         self.count = 0
...     def add_hit(self, hit):
...         score = hit.constraints['DIST1'] + hit.constraints['DIST2']
...         if len(self.hits) == self.max_hits:
...             self.hits.sort()
...             if score < self.hits[-1][0]:
...                 self.hits.pop()
...                 self.hits.append((score, hit))
...         else:
...             self.hits.append((score, hit))
...         if time.time() - self.start_time > self.timeout:
...             self.cancel()
...         if hit.identifier != self.refcode:
...             self.refcode = hit.identifier
...             self.count += 1

Now we can use it to search:

>>> closest = ClosestContacts()
>>> closest.search(searcher)

After 20 seconds the search will be terminated, and we can inspect the hits:

>>> hits = closest.hits
>>> hits.sort()
>>> print(' '.join('%.2f' % h[0] for h in hits)) 
2.71 2.82 2.98 3.06 3.08
>>> print(' '.join(h[1].identifier for h in hits)) 
ABUPES ABUNOA ADIPAC ABUPES ACSALA

Substructure searching¶

Introduction¶

Setting up a substructure¶

Setting up and running a substructure search¶

Substructure search hits¶

Create a substructure from scratch¶

Query Atoms¶

Atom constraints¶

Constraint conditions¶

Query Bonds¶

Query Substructure¶

Substructure searching with geometric measurements¶

Substructure searching with geometric constraints¶

Search filters¶

Disorder¶

Interactive Hit Processing¶

Table of Contents

Previous topic

Next topic