Search API¶

Introduction¶

The ccdc.search module provides various search classes.

The main classes of the ccdc.search module are:

ccdc.search.TextNumericSearch
ccdc.search.SubstructureSearch
ccdc.search.SimilaritySearch
ccdc.search.ReducedCellSearch
ccdc.search.CombinedSearch

These all inherit from the base class ccdc.search.Search. The base ccdc.search.Search contains nested classes defining basic search hits and settings:

ccdc.search.Search.SearchHit
ccdc.search.Search.Settings

The base class ccdc.search.Search also contains the ccdc.search.Search.search() function which is used to search the CSD.

All the searches except ccdc.search.TextNumericSearch also support searching of the following additional data sources:

a Python list of identifiers
a molecule file path
a ccdc.io reader
an individual ccdc.molecule.Molecule
an individual ccdc.crystal.Crystal
a list of molecules, crystals or entries

The ccdc.search.TextNumericSearch can only sensibly be applied to a crystal structure database, which is the CSD by default or a ccdc.io.EntryReader opened on a database file.

The ccdc.search.Search.search() returns a list of ccdc.search.Search.SearchHit instances. Some of the searches make use of more specific search hit classes, namely:

ccdc.search.TextNumericSearch.TextNumericHit
ccdc.search.SubstructureSearch.SubstructureHit
ccdc.search.SimilaritySearch.SimilarityHit

Most of the searches return simple Python lists of search hits. However, a search carried out using a ccdc.search.SubstructureSearch returns a ccdc.search.SubstructureSearch.SubstructureHitList, which contains a ccdc.search.SubstructureSearch.SubstructureHitList.superimpose() function for superimposing all the hits on the first instance in the list.

To illustrate some of the searches let us first get an aspirin molecule.

>>> from ccdc.io import EntryReader
>>> csd_reader = EntryReader('CSD')
>>> mol = csd_reader.molecule('ACSALA')

Text numeric searching.

>>> from ccdc.search import TextNumericSearch
>>> text_numeric_search = TextNumericSearch()
>>> text_numeric_search.add_compound_name('aspirin')
>>> hits = text_numeric_search.search()
>>> len(hits)
106

Substructure searching.

>>> from ccdc.search import MoleculeSubstructure, SubstructureSearch
>>> substructure = MoleculeSubstructure(mol)
>>> substructure_search = SubstructureSearch()
>>> _ = substructure_search.add_substructure(substructure)
>>> hits = substructure_search.search()
>>> len(hits)
70

Similarity searching.

>>> from ccdc.search import SimilaritySearch
>>> similarity_search = SimilaritySearch(mol)
>>> hits = similarity_search.search()
>>> len(hits)
117

Reduced cell searching.

>>> from ccdc.search import ReducedCellSearch
>>> crystal = csd_reader.crystal('ACSALA')
>>> query = ReducedCellSearch.CrystalQuery(crystal)
>>> reduced_cell_searcher = ReducedCellSearch(query)
>>> hits = reduced_cell_searcher.search()
>>> len(hits)
19

Combined searches.

>>> from ccdc.search import CombinedSearch
>>> combined_search = CombinedSearch(similarity_search & -text_numeric_search)
>>> hits = combined_search.search()
>>> len(hits)
33

API¶

Classes for defining substructures¶

class ccdc.search.QueryAtom(atomic_symbol='', _substructure_atom=None)[source]¶

Atom used to define a substructure search.

A QueryAtom can be used to represent a single atom type or a set of atom types. A QueryAtom can also have additional constraints imposed on it, for example that it should be aromatic.

Let us create a query atom representing an oxygen atom.

>>> query_atom = QueryAtom('O')
>>> print(query_atom)
QueryAtom(O)

Suppose that we wanted the query atom to be either a carbon or a nitrogen atom.

>>> query_atom = QueryAtom(['C', 'N'])
>>> print(query_atom)
QueryAtom(C, N)

It is possible to add further constraints on a QueryAtom. For, example, we can insist that it should be aromatic.

>>> query_atom.aromatic = True
>>> print(query_atom.aromatic)
AtomAromaticConstraint: 1
>>> print(query_atom)
QueryAtom(C, N)[atom aromaticity: equal to 1]

See Query Atoms for further details.

property acceptor¶

Constraint specifying whether or not the QueryAtom is an acceptor.

>>> a = QueryAtom(['C', 'N'])
>>> a.acceptor = True
>>> print(a)
QueryAtom(C, N)[AtomAcceptorTypeConstraint]

add_connected_element_count(atomic_symbols, count)[source]¶

Set the number of connected elements constraint.

Constraint to define the number of times the QueryAtom should be connected to atoms with elements defined in the atomic_symbols list.

Parameters:

atomic_symbols – atomic symbol or list of atomic symbols.
count – see Constraint conditions for details.

>>> a = QueryAtom(['C', 'N'])
>>> a.add_connected_element_count(['F', 'Cl'], 2)
>>> print(a)
QueryAtom(C, N)[count connected elements equal to 2 from [F,Cl]]

add_protein_atom_type_constraint(*types)[source]¶

Add a constraint that an atom be in one of the protein atom types.

This is of use only when searching a protein structure.

Parameters:: *types – one or more of ‘AMINO_ACID’, ‘LIGAND’, ‘COFACTOR’, ‘WATER’, ‘METAL’, ‘NUCLEOTIDE’, ‘UNKNOWN’. Any case-insensitive, unique prefix may be used.

>>> a = QueryAtom('Zn')
>>> a.add_protein_atom_type_constraint('Ligand', 'Metal')
>>> print(a)
QueryAtom(Zn)[protein substructure type : one of 1, 3]

property aromatic¶

Constraint specifying whether or not the QueryAtom is aromatic.

>>> a = QueryAtom(['C', 'N'])
>>> a.aromatic = True
>>> print(a)
QueryAtom(C, N)[atom aromaticity: equal to 1]

property chirality¶

Constraint specifying the chirality around an atom.

The return value will either be None or a tuple of 4 QueryAtoms in clockwise order.

>>> s = SMARTSSubstructure("FC(I)O[C@](S)(P)H")
>>> s.atoms[1].chirality is None
True
>>> s.atoms[4].chirality
(QueryAtom(O)[atom aromaticity: equal to 0], QueryAtom(H), QueryAtom(P)[atom aromaticity: equal to 0], QueryAtom(S)[atom aromaticity: equal to 0])

property cyclic¶

Constraint specifying whether or not the QueryAtom is part of a cycle.

>>> a = QueryAtom(['C', 'N'])
>>> a.cyclic = True
>>> print(a)
QueryAtom(C, N)[atom cyclicity: equal to 1]

property cyclic_bonds¶

Constraint specifying the number of cyclic bonds of the QueryAtom.

>>> a = QueryAtom(['C', 'N'])
>>> a.cyclic_bonds = ('!=', 4)
>>> print(a)
QueryAtom(C, N)[number of cyclic bonds:not equal to 4]

property donor¶

Constraint specifying whether or not the QueryAtom is a donor.

>>> a = QueryAtom(['C', 'N'])
>>> a.donor = True
>>> print(a)
QueryAtom(C, N)[AtomDonorTypeConstraint]

property formal_charge¶

Constraint specifying the formal charge on the QueryAtom.

>>> a = QueryAtom(['C', 'N'])
>>> a.formal_charge = ('in', [-1, 1])
>>> print(a)
QueryAtom(C, N)[charge: one of -1, 1]

property formal_valency¶

Constraint specifying the formal valency of the QueryAtom.

>>> a = QueryAtom(['C', 'N'])
>>> a.formal_valency = ('>', 3)
>>> print(a)
QueryAtom(C, N)[atom valency: greater than 3]

property has_3d_coordinates¶

Constraint specifying that the atom has 3d coordinates.

>>> a = QueryAtom(['C', 'N'])
>>> a.has_3d_coordinates = True
>>> print(a)
QueryAtom(C, N)[atom must have 3D site]

property index¶

Index of this atom in a substructure.

>>> atom = QueryAtom(['C', 'N'])
>>> print(atom.index)
None
>>> substructure = QuerySubstructure()
>>> _ = substructure.add_atom(atom)
>>> print(atom.index)
0

property label_match¶

Constraint specifying that the atom label must match a regular expression.

>>> a = QueryAtom(['C'])
>>> a.label_match = '^C12$'
>>> print(a)
QueryAtom(C)[atom label must match regular expression with pattern: ^C12$]

property nimplicit_hydrogens¶

Constraint specifying a count of implicit hydrogens.

>>> a = QueryAtom(['C', 'N'])
>>> a.nimplicit_hydrogens = 0
>>> print(a)
QueryAtom(C, N)[implicit hydrogen count: equal to 0]

property num_bonds¶

Constraint specifying the number of bonds the QueryAtom may have.

>>> a = QueryAtom(['C', 'N'])
>>> a.num_bonds = ('<=', 3)
>>> print(a)
QueryAtom(C, N)[number of connected atoms: less than or equal to 3]

property num_hydrogens¶

Constraint specifying the number of hydrogens the QueryAtom may have.

>>> a = QueryAtom(['C', 'N'])
>>> a.num_hydrogens = 1
>>> print(a)
QueryAtom(C, N)[hydrogen count, including deuterium: equal to 1]

property smallest_ring¶

Constraint specifying the size of the smallest ring the QueryAtom forms part of.

>>> a = QueryAtom(['C', 'N'])
>>> a.smallest_ring = (5, 6)
>>> print(a)
QueryAtom(C, N)[atom smallest ring: in range 5 to 6]

property unfused_unbridged_ring¶

Constraint specifying whether or not the QueryAtom is part of an unfused and unbridged ring.

>>> a = QueryAtom(['C', 'N'])
>>> a.unfused_unbridged_ring = True
>>> print(a)
QueryAtom(C, N)[atom unfused/unbridged ring: equal to 1]

class ccdc.search.QueryBond(bond_type=None, _substructure_bond=None)[source]¶

Bond used to define a substructure search.

A QueryBond can be used to represent a single bond type or a set of bond types. A QueryBond can also have additional constraints imposed on it, for example that it should be cyclic.

Let us create a QueryBond that will match any bond type.

>>> query_bond = QueryBond()
>>> print(query_bond)  
QueryBond(Unknown, Single, Double, Triple,
          Quadruple, Aromatic, Delocalised, Pi)

To create a more specific QueryBond we need to specify some bond types.

>>> from ccdc.molecule import Bond
>>> single_bond = Bond.BondType('Single')
>>> double_bond = Bond.BondType('Double')
>>> query_bond = QueryBond(single_bond)
>>> print(query_bond)
QueryBond(Single)
>>> query_bond = QueryBond([single_bond, double_bond])
>>> print(query_bond)  
QueryBond(Single, Double)

Finally, let us set a constraint for the bond to be cyclic.

>>> query_bond.cyclic = True
>>> print(query_bond)
QueryBond(Single, Double)[bond cyclicity: equal to 1]

>>> print(query_bond.cyclic)
BondCyclicityConstraint: 1

property atoms¶

A list of the two QueryAtoms of the bond, if it is in a substructure, or None.

>>> s = QuerySubstructure()
>>> c = s.add_atom(QueryAtom('C'))
>>> n = s.add_atom(QueryAtom('N'))
>>> b = QueryBond(['Single', 'Double'])
>>> _ = s.add_bond(b, c, n)
>>> print(b)
QueryBond(Single, Double)
>>> print('%s, %s' % (b.atoms[0], b.atoms[1]))
QueryAtom(C), QueryAtom(N)

property bond_length¶

Constraint specifying the length of the bond.

>>> b = QueryBond('Single')
>>> c1 = QueryAtom('C')
>>> c2 = QueryAtom('C')
>>> s = QuerySubstructure()
>>> _ = s.add_atom(c1)
>>> _ = s.add_atom(c2)
>>> _ = s.add_bond(b, c1, c2)
>>> b.bond_length = ('>', 1.6)
>>> print(b)
QueryBond(Single)[bond length: greater than 1.6]

property bond_polymeric¶

Constraint specifying whether or not the QueryBond is polymeric.

>>> b = QueryBond('Single')
>>> b.bond_polymeric = True
>>> print(b)
QueryBond(Single)[bond polymeric: equal to 1]

property bond_smallest_ring¶

Constraint specifying the smallest ring the bond should be a part of.

>>> b = QueryBond('Aromatic')
>>> b.bond_smallest_ring = 5
>>> print(b)
QueryBond(Aromatic)[bond smallest ring: equal to 5]

property bond_unfused_unbridged_ring¶

Constraint specifying whether or not the QueryBond is part of an unfused and unbridged ring.

>>> b = QueryBond('Single')
>>> b.bond_unfused_unbridged_ring = True
>>> print(b)
QueryBond(Single)[bond unfused/unbridged ring: equal to 1]

property cyclic¶

Constraint specifying whether or not the QueryBond is part of a cycle.

>>> b = QueryBond('Single')
>>> b.cyclic = True
>>> print(b)
QueryBond(Single)[bond cyclicity: equal to 1]

property stereochemistry¶

Constraint specifying the stereochemistry around a double bond.

The return value will either be None or a tuple of 2 QueryAtoms and one of ‘cis’ or ‘trans’.

>>> s = SMARTSSubstructure(r"I/C=C\F")
>>> s.bonds[1].stereochemistry
(QueryAtom(I), QueryAtom(F), 'cis')

class ccdc.search.QuerySubstructure(_substructure=None)[source]¶

Class to define and run substructure searches.

As an example let us set up a QuerySubstructure for a carbonyl (C=O).

>>> from ccdc.molecule import Bond
>>> double_bond = Bond.BondType('Double')
>>> substructure_query = QuerySubstructure()
>>> query_atom1 = substructure_query.add_atom('C')
>>> query_atom2 = substructure_query.add_atom('O')
>>> query_bond = substructure_query.add_bond(double_bond, query_atom1, query_atom2)

add_atom(atom)[source]¶

Add an atom to the substructure.

Parameters:: atom – may be a QueryAtom separately constructed, an atom of a molecule, or an atomic symbol.
Returns:: QueryAtom

>>> q = QuerySubstructure()
>>> a = q.add_atom(QueryAtom(['N', 'O']))
>>> print(a)
QueryAtom(N, O)

add_bond(bond, atom1=None, atom2=None)[source]¶

Add a bond to the substructure.

Parameters:

bond – may be a QueryBond, a ccdc.molecule.Bond.BondType, a ccdc.molecule.Bond, a string or an int.
atom1 – QueryAtom or None for any atom
atom2 – QueryAtom or None for any atom

Returns:

QueryBond

Raises:

TypeError if an improper bond argument is supplied

>>> s = QuerySubstructure()
>>> c = s.add_atom(QueryAtom('C'))
>>> o1 = s.add_atom(QueryAtom('O'))
>>> o2 = s.add_atom(QueryAtom('O'))
>>> h = s.add_atom(QueryAtom('H'))
>>> _ = s.add_bond(QueryBond('Double'), c, o1)
>>> _ = s.add_bond(QueryBond('Single'), c, o2)
>>> _ = s.add_bond(QueryBond('Single'), o2, h)

property atoms¶

The query atoms in the substructure.

>>> q = QuerySubstructure()
>>> _ = q.add_atom(QueryAtom('C'))
>>> _ = q.add_atom(QueryAtom(['O', 'N']))
>>> atoms = q.atoms
>>> print('%s, %s' % (atoms[0], atoms[1]))
QueryAtom(C), QueryAtom(N, O)

property bonds¶

The bonds in the substructure.

>>> s = QuerySubstructure()
>>> b = s.add_bond('Single', QueryAtom('C'), QueryAtom('F'))
>>> bonds = s.bonds
>>> print(bonds[0])
QueryBond(Single)

clear()[source]¶: Restart the query.

match_atom(atom, query_atom=None)[source]¶

Whether or not the given atom matches the query_atom in the given context.

Parameters:

atom – a ccdc.molecule.Atom instance.
query_atom – a ccdc.search.QueryAtom instance or None. If None, the first atom of the substructure will be used.

Returns:

bool

>>> s = QuerySubstructure()
>>> _ = s.add_bond('Single', QueryAtom('Cl'), QueryAtom('C'))
>>> mol = EntryReader('csd').molecule('AABHTZ')
>>> s.match_atom(mol.atom('Cl1'))
True
>>> s.match_atom(mol.atom('C1'))
False
>>> s.match_atom(mol.atom('C1'), s.atoms[1])
True

match_molecule(molecule)[source]¶

Whether or not the query matches the specified molecule.

Parameters:: molecule – a ccdc.molecule.Molecule instance.
Returns:: bool

>>> s = QuerySubstructure()
>>> _ = s.add_bond('Double', QueryAtom('C'), QueryAtom('O'))
>>> mol = EntryReader('csd').molecule('AABHTZ')
>>> s.match_molecule(mol)
True

nmatch_molecule(molecule)[source]¶

Returns number of query matches within the specified molecule.

Parameters:: molecule – a ccdc.molecule.Molecule instance.
Returns:: integer

>>> s = QuerySubstructure()
>>> _ = s.add_bond('Single', QueryAtom('Cl'), QueryAtom('C'))
>>> mol = EntryReader('csd').molecule('AABHTZ')
>>> s.nmatch_molecule(mol)
2

write_xml(file_name)[source]¶

Write an XML representation of the substructure. Deprecated.

Parameters:: fname – path to XML file

class ccdc.search.SMARTSSubstructure(smarts)[source]¶

Make a substructure from a SMARTS string.

Let us create a ketone SMARTSSubstructure as an example.

>>> smarts_query = SMARTSSubstructure("[CD4][CD3](=[OD1])[CD4]")
>>> print(smarts_query.smarts)
[CD4][CD3](=[OD1])[CD4]

There is a minor extension to Daylight SMARTS to allow the representation of quadruple, delocalised and pi bonds, using the characters ‘_’, ‘”’ and ‘|’ respectively.

There is a second minor extension to allow easy access to the indices of the atoms.

>>> query = SMARTSSubstructure("[#6:0]([#7]-H)[#8:1][#6:2]")
>>> print(query.label_to_atom_index(0))
0
>>> print(query.label_to_atom_index(1))
3

label_to_atom_index(label)[source]¶: Translate a SMARTS label into the appropriate substructure atom index

property smarts¶: The SMARTS string.

class ccdc.search.MoleculeSubstructure(mol, match_stereochemistry=False)[source]¶

Make a substructure query from an entire molecule.

Can be used to search for exact matches of a molecule when appropraite num_bonds or add_connected_element_count constraints are set on the QueryAtoms. Furthermore if hydrogen atoms have been removed from the molecule used to initialise the MoleculeSubstructure it can be used to find hits that match the heavy atoms as a substructure.

Parameters:

mol – ccdc.molecule.Molecule
match_stereochemistry – Should the substructure constrain target stereochemistry to match the input molecule’s stereochemistry?

Raises:

TypeError if the passed in molecule has multiple components since multi-component molecule substructure searches are not supported. The components should be added as separate substructures.

>>> mol = EntryReader('csd').molecule('AABHTZ')
>>> sub = MoleculeSubstructure(mol)

class ccdc.search.ConnserSubstructure(file_name, _conn=None)[source]¶

Read a Conquest query language file.

static from_string(text)[source]¶: Create a substructure from a textual representation of a Connser file.

interaction_library_contact_atoms()[source]¶

Provide the list of indexes of atoms into the substructure (optionally) defined in the ConnSer query for generating the data in the CCDC interaction library

The list of indexes are into the list of substructure atoms with the associated substructure

see ccdc.interactions for more information on the interaction library

Search classes¶

class ccdc.search.Search(settings=None)[source]¶

Common base class for searches

class SearchHit(identifier, _database=None, _entry=None, _crystal=None, _molecule=None, _binary_database=None)[source]¶

Base class for search hits.

Provides access to molecules, crystals and entries.

property crystal¶: The crystal corresponding to a search hit.

property entry¶: The entry corresponding to a search hit.

property identifier¶: The string identifier of the hit.

property molecule¶: The molecule corresponding to a search hit.

class Settings(_settings=None)[source]¶

Base class for search settings.

property has_3d_coordinates¶: Constrain hits to have 3d coordinates.

property max_hit_structures¶: The number of structures which may be returned from a search.

property max_r_factor¶

Constrain the hits to have an R-factor less than this.

The R-factor will be expressed as a percentage.

property must_have_elements¶

Elements which must be present in a hit.

The elements will be presented as a list of atomic symbols.

>>> settings = Search.Settings()
>>> settings.must_have_elements = ['C', 'N', 'O', 'S']
>>> print(settings.must_have_elements)
[C (6), N (7), O (8), S (16)]

property must_not_have_elements¶

Elements which must not be present in a hit.

The elements will be presented as a list of symbols.

>>> settings = Search.Settings()
>>> settings.must_not_have_elements = ['S', 'P', 'K']
>>> print(settings.must_not_have_elements)
[P (15), S (16), K (19)]

property no_disorder¶

Constrain hits to have no disorder.

The value will be False (no filtering), ‘Non-hydrogen’ (filter structures with heavy atom disorder) or ‘All’ (filter structures with any disordered atoms).

property no_errors¶: Constrain the hits to have no suppressed errors.

property no_ions¶: Constrain the hits not to have a residue with a formal charge. The hits may include zwitterions.

property no_metals¶: Constrain the hits not to have a metal atom.

property no_powder¶: Constrain hits not to be powder studies.

property not_polymeric¶: Constrain the hits not to be polymeric structures.

property only_organic¶: Constrain hits to be organic compounds.

property only_organometallic¶: Constrain hits to be only organometallic compounds.

test(argument)[source]¶

Test that the argument satisfies the requirements of the settings instance.

Parameters:: argument – a ccdc.entry.Entry, ccdc.crystal.Crystal or ccdc.molecule.Molecule instance.
Returns:: bool

>>> entry = EntryReader('csd').entry('AABHTZ')
>>> settings = Search.Settings()
>>> settings.test(entry)
True
>>> settings.only_organometallic = True
>>> settings.test(entry)
False

search(database=None, max_hit_structures=None, max_hits_per_structure=None)[source]¶: Perform a search.

class ccdc.search.SimilaritySearch(mol=None, threshold=0.7, coefficient='tanimoto', settings=None)[source]¶

Class to define and run similarity searches.

class Settings(threshold=0.7, coefficient='tanimoto', _settings=None)[source]¶

property coefficient¶: This should be either ‘dice’ or ‘tanimoto’, the default.

property sort_order¶

The order in which hits will be sorted.

THis should be either ‘alphabetic’ or ‘value’, the default.

property threshold¶

The similarity threshold to apply.

This is a value between 0.0 and 1.0.

class SimilarityHit(similarity, identifier, _database=None, _entry=None, _crystal=None, _molecule=None, _binary_database=None)[source]¶

A search hit recording the similarity measure.

The SimilarityHit instance will give access to the identifier of the hit, the value of the similarity to the query molecule, the entry, crystal or molecule of the hit.

property coefficient¶: Which coefficient to use when determining similarity.

static from_xml(xml)[source]¶

Create a SimilaritySearch from an XML representation.

Parameters:: xml – XML string

static from_xml_file(file_name)[source]¶

Create a SimilaritySearch from an XML file.

Parameters:: file_name – path to XML file
Raises:: IOError when the file does not exist

property molecule¶: The query molecule.

read_xml(xml)[source]¶

Read a query from an an XML representation.

Parameters:: xml – XML string

read_xml_file(file_name)[source]¶

Read an XML file into the similarity searcher.

Parameters:: file_name – path to XML file
Raises:: IOError if the file cannot be read

search_molecule(mol)[source]¶

Search a molecule.

This can be used to determine a similarity coefficient against the given molecule.

Parameters:: mol – ccdc.molecule.Molecule
Returns:: SimilaritySearch.SimilarityHit

>>> csd = EntryReader('csd')
>>> ibuprofen = csd.molecule('HXACAN')
>>> searcher = SimilaritySearch(ibuprofen)
>>> hit = searcher.search_molecule(csd.molecule('IBPRAC'))
>>> print(round(hit.similarity, 3))
0.161

property threshold¶: The similarity threshold to use.

class ccdc.search.TextNumericSearch(settings=None)[source]¶

Class to define and run text/numeric searches in a crystal structure database.

It is possible to add one or more criterion for the query to match.

>>> text_numeric_query = TextNumericSearch()
>>> text_numeric_query.add_compound_name('aspirin')
>>> text_numeric_query.add_citation(year=[2011, 2013])
>>> for hit in text_numeric_query.search(max_hit_structures=3):
...     print(hit.identifier)
...
ACSALA19
ACSALA20
ACSALA21

A human-readable representation of the queries may be obtained: >>> print(’, ‘.join(q for q in text_numeric_query.queries)) Compound name aspirin anywhere , Journal year in range 2011-2013

class TextNumericHit(identifier, _db)[source]¶: Hit from a TextNumericSearch.

class TextNumericSearchSettings(_settings=None)[source]¶: No settings apart from those provided by the base class required.

add_all_identifiers(refcode, mode='anywhere', ignore_non_alpha_num=False)[source]¶

Search for an identifier, including previous identifiers.

>>> from ccdc.search import TextNumericSearch
>>> query = TextNumericSearch()
>>> query.add_all_identifiers('DABHUJ')
>>> hits = query.search()
>>> print(hits[0].identifier)
ACPRET03
>>> print(hits[0].entry.previous_identifier)
DABHUJ

add_all_text(txt, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for text anywhere in the entry.

add_analogue(analogue, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for an analogue.

add_author(author, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for an author.

add_bioactivity(activity, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for a particular bio-activity.

add_ccdc_number(value)[source]¶

Search for a particular or a range of CCDC deposition numbers.

>>> from ccdc.search import TextNumericSearch
>>> searcher = TextNumericSearch()
>>> searcher.add_ccdc_number(241370)
>>> hits = searcher.search()
>>> len(hits)
1
>>> entry = hits[0].entry
>>> print('%s %s' % (entry.identifier, entry.ccdc_number))
ABEBUF 241370
>>> searcher.clear()
>>> searcher.add_ccdc_number((241368, 241372))
>>> hits = searcher.search()
>>> print(len(hits))
3
>>> for hit in hits:
...     print('%s %s' % (hit.identifier, hit.entry.ccdc_number))
...
ABEBUF 241370
BIBZIW 241371
BIMGEK 241372

add_citation(author='', journal='', volume=None, year=None, first_page=None, ignore_non_alpha_num=False, _coden=None)[source]¶

Search for a citation.

Note: the journal parameter requires the CSD to be present in order to translate the journal name to a coden identifier. If the CSD is not present, but an alternative database is, use the alternative database’s journals dict to look up a coden identifier and specify the _coden parameter in this function.

add_color(color, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for a particular colour.

add_compound_name(compound_name, mode='anywhere', ignore_non_alpha_num=False)[source]¶

Search for a compound name.

The search checks the content both of ccdc.entry.Entry.chemical_name and ccdc.entry.Entry.synonyms.

To illustrate this let us have a look at the CSD entry ABABEM.

>>> from ccdc.io import EntryReader
>>> entry_reader = EntryReader('CSD')
>>> ababem = entry_reader.entry('ABABEM')
>>> print(ababem.chemical_name)
Tetrahydro[1,3,4]thiadiazolo[3,4-a]pyridazine-1,3-dione
>>> print(ababem.synonyms[0])
8-Thia-1,6-diazabicyclo[4.3.0]nonane-7,9-dione

The text azabicyclo[4.3.0]nonane is only found in the synonym. Let us search for it using a compound name search.

>>> from ccdc.search import TextNumericSearch
>>> query = TextNumericSearch()
>>> query.add_compound_name('azabicyclo[4.3.0]nonane')
>>> hits = query.search()

Finally let us assert that we have found ABABEM.

>>> assert(u'ABABEM' in [h.identifier for h in hits])

add_disorder(disorder, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for a disorder comment.

add_doi(doi, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for a DOI.

add_habit(habit, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for a particular habit.

add_heat_capacity_notes(heat_capacity_notes, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for heat capacity notes.

add_heat_of_fusion_notes(heat_of_fusion_notes, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for heat of fusion notes.

add_identifier(refcode, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for a refcode.

add_peptide_sequence(peptide_sequence, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for a peptide sequence.

add_phase_transition(phase_transition, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for a phase transition.

add_polymorph(polymorph, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for polymorph information.

add_pore_analysis_max_pore_diameter(value)[source]¶

Search for pore analysis calculated maximum pore diameter.

See ccdc.descriptors.CrystalDescriptors.PoreAnalyser.max_pore_diameter

add_pore_analysis_num_percolated_dimensions(value)[source]¶

Search for pore analysis calculated number of percolated dimensions.

See ccdc.descriptors.CrystalDescriptors.PoreAnalyser.num_percolated_dimensions

add_pore_analysis_pore_limiting_diameter(value)[source]¶

Search for pore analysis calculated pore limiting diameter.

See ccdc.descriptors.CrystalDescriptors.PoreAnalyser.pore_limiting_diameter

add_pore_analysis_total_geometric_volume(value)[source]¶

Search for pore analysis calculated total geometric volume.

See ccdc.descriptors.CrystalDescriptors.PoreAnalyser.total_geometric_volume

add_pore_analysis_total_surface_area(value)[source]¶

Search for pore analysis calculated total surface area.

See ccdc.descriptors.CrystalDescriptors.PoreAnalyser.total_surface_area

add_predicted_semiconductor_dynamic_disorder(value)[source]¶

Search for predicted semiconductor dynamic disorder.

See ccdc.entry.SemiconductorPredictedProperties.dynamic_disorder

add_predicted_semiconductor_hole_reorganization_energy(value)[source]¶

Search for predicted semiconductor hole reorganization energy.

See ccdc.entry.SemiconductorPredictedProperties.hole_reorganization_energy

add_predicted_semiconductor_homo_lumo_gap(value)[source]¶

Search for predicted semiconductor HOMO-LUMO gap.

See ccdc.entry.SemiconductorPredictedProperties.homo_lumo_gap

add_predicted_semiconductor_singlet_state_1_energy(value)[source]¶

Search for predicted semiconductor singlet state 1 energy.

See ccdc.entry.SemiconductorPredictedProperties.singlet_state_1_energy

add_predicted_semiconductor_singlet_state_1_oscillator_strength(value)[source]¶

Search for predicted semiconductor singlet state 1 oscillator strength.

See ccdc.entry.SemiconductorPredictedProperties.singlet_state_1_oscillator_strength

add_predicted_semiconductor_singlet_state_2_energy(value)[source]¶

Search for predicted semiconductor singlet state 2 energy.

See ccdc.entry.SemiconductorPredictedProperties.singlet_state_2_energy

add_predicted_semiconductor_singlet_state_2_oscillator_strength(value)[source]¶

Search for predicted semiconductor singlet state 2 oscillator strength.

See ccdc.entry.SemiconductorPredictedProperties.singlet_state_2_oscillator_strength

add_predicted_semiconductor_transfer_integral(value)[source]¶

Search for predicted semiconductor transfer integral.

See ccdc.entry.SemiconductorPredictedProperties.transfer_integral

add_predicted_semiconductor_triplet_state_1_energy(value)[source]¶

Search for predicted semiconductor triplet state 1 energy.

See ccdc.entry.SemiconductorPredictedProperties.triplet_state_1_energy

add_predicted_semiconductor_triplet_state_2_energy(value)[source]¶

Search for predicted semiconductor triplet state 2 energy.

See ccdc.entry.SemiconductorPredictedProperties.triplet_state_2_energy

add_publication_title(publication_title, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for publication title.

add_refinement_goodness_of_fit(value)[source]¶

Search for refinement goodness of fit.

See ccdc.entry.ExperimentalInfo.refinement_goodness_of_fit

add_refinement_max_shift(value)[source]¶

Search for refinement max shift.

See ccdc.entry.ExperimentalInfo.refinement_max_shift

add_refinement_number_of_constraints(value)[source]¶

Search for refinement number of constraints

See ccdc.entry.ExperimentalInfo.refinement_number_of_constraints

add_refinement_number_of_parameters(value)[source]¶

Search for refinement number of parameters

See ccdc.entry.ExperimentalInfo.refinement_number_of_parameters

add_refinement_number_of_restraints(value)[source]¶

Search for refinement number of restraints

See ccdc.entry.ExperimentalInfo.refinement_number_of_restraints

add_refinement_residual_electron_density_max(value)[source]¶

Search for refinement residual electron density max

See ccdc.entry.ExperimentalInfo.refinement_residual_electron_density_max

add_refinement_residual_electron_density_min(value)[source]¶

Search for refinement residual electron density min

See ccdc.entry.ExperimentalInfo.refinement_residual_electron_density_min

add_refinement_weighted_r_factor(value)[source]¶

Search for refinement weighted r factor.

See ccdc.entry.ExperimentalInfo.refinement_weighted_r_factor

add_reflection_max_theta(value)[source]¶

Search for reflection max theta.

See ccdc.entry.ExperimentalInfo.reflection_max_theta

add_solubility_notes(solubility_notes, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for solubility notes.

add_solvent(solvent, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for a solvent.

add_source(source, mode='anywhere', ignore_non_alpha_num=False)[source]¶

Search for a source.

>>> from ccdc.search import TextNumericSearch
>>> searcher = TextNumericSearch()
>>> searcher.add_source('toad')
>>> hits = searcher.search(max_hit_structures=5)
>>> for h in hits:
...     print('%-8s: %s' % (h.identifier, h.entry.source))
...
CUXYAV  : Ch'an Su (dried venom of Chinese toad)
EWAWUW  : isolated from the eggs of toad Bufo bufo gargarizans
EWAXAD  : isolated from the eggs of toad Bufo bufo gargarizans
FIFDUT  : dried venom of Chinese toad Ch'an Su
FIFFAB  : dried venom of Chinese toad Ch'an Su

add_spacegroup_symbol(spacegroup_symbol, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for a spacegroup symbol or any alias of that symbol.

add_synonym(synonym, mode='anywhere', ignore_non_alpha_num=False)[source]¶: Search for a synonym.

clear()[source]¶: Restart a search.

static from_xml(xml)[source]¶

Create a TextNumericSearch from XML.

Parameters:: xml – XML string

static from_xml_file(file_name)[source]¶

Create a TextNumericSearch from an XML file.

Parameters:: file_name – path to XML file
Raises:: IOError when the file does not exist

is_journal_valid(journal)[source]¶

Check the validity of a specified journal name in the CSD.

This requires the CSD to be present.

Parameters:: journal – str, journal name

property journals¶

A dictionary of journal name : ccdc code number for journals in the CSD.

This requires the CSD to be present.

property queries¶

The current set of queries for this search.

>>> tns = TextNumericSearch()
>>> tns.add_all_text('ibuprofen')
>>> tns.add_author('Haisa')
>>> print('; '.join(str(q).strip() for q in tns.queries))
All text ibuprofen anywhere; Author Haisa anywhere

read_xml(xml)[source]¶

Read a query from XML.

Parameters:: xml – XML string

read_xml_file(file_name)[source]¶

Read a text numeric search from an XML file.

Parameters:: file_name – path to XML file
Raises:: IOError if the file cannot be read

class ccdc.search.SubstructureSearch(settings=None)[source]¶

Query crystal structures for interactions.

class HitProcessor[source]¶

Override this class to provide your own add_hit() method.

This class allows a search to process hits as they are found by the search class, rather than waiting until all hits are found before allowing access to them, a procedure which may well run out of memory for very general searches.

add_hit(hit)[source]¶: Override this to provide your own hit processing.

cancel()[source]¶: Cancels the search.

search(searcher, database=None)[source]¶

Searches the database with the substructure search.

Parameters:

searcher – a ccdc.search.SubstructureSearch instance.
database – a ccdc.io.EntryReader instance. If not specified the CSD will be searched.

For each hit found, ccdc.Search.SubstructureSearch.HitProcessor.add_hit() will be called with a ccdc.search.SubstructureSearch.SubstructureHit instance.

class Settings(max_hit_structures=None, max_hits_per_structure=None)[source]¶

Settings appropriate to a substructure search.

property match_enantiomers¶

Enantiomer matching behavior

The value will be one of ‘NEVER’ meaning enantiomers are never checked, ‘SPACEGROUP_DEPENDENT’ meaning enantiomers are checked if the crystal’s spacegroup implies the presence of enantiomers, or ‘ALWAYS’ meaning enantiomers are always checked.

property max_hits_per_structure¶: Maximum number of hits per structure.

class SubstructureHit(identifier, match=None, search_structure=None, query=None, _database=None, _entry=None, _crystal=None, _molecule=None, _binary_database=None)[source]¶

A hit from a substructure search.

centroid_atoms(name)[source]¶: The atoms from which the centroid is derived.

centroid_objects(name)[source]¶: The geometric object names and atoms from which the centroid was defined.

constraint_atoms(name)[source]¶

The atoms from which the constraint was defined.

Parameters:: name – the name of the constraint.
Returns:: a tuple of ccdc.molecule.Atom instances.

The atoms will be returned in an arbitrary order. All atoms involved in defining the constraint will be returned.

constraint_objects(constraint)[source]¶: A tuple of object names and atoms from which the constraint was defined.

dummy_point_atoms(name)[source]¶: The atoms from which the dummy point was defined.

dummy_point_objects(name)[source]¶: The geometric object names and atoms from which the dummy point was defined.

group_atoms(name)[source]¶: The atoms from which the group was defined.

group_objects(name)[source]¶: The geometric object names and atoms from which the group was defined.

measurement_atoms(name)[source]¶

The atoms involved in a measurement.

Parameters:: name – the name of the measurement.
Returns:: a tuple of ccdc.molecule.Atom instances.

The atoms will be returned in an arbitrary order. All atoms involved in the measurement will be present, so for example a centroid-centroid distance measurement will produce the atoms of both centroids.

measurement_objects(measurement)[source]¶

A tuple of object names and atoms from which the measurement was taken.

Parameters:: measurement – the string name of the measurement.
Returns:: a tuple of geometric object names or atoms.

plane_atoms(name)[source]¶: The atoms from which the plane was defined.

plane_objects(name)[source]¶: The geometric object names and atoms from which the plane was defined.

vector_atoms(name)[source]¶: The atoms from which the vector was defined.

vector_objects(name)[source]¶: The geometric object names and atoms from which the vector was defined.

class SubstructureHitList(iterable=(), /)[source]¶

List of hits from a ccdc.search.SubstructureSearch

superimpose()[source]¶

Superimpose all matched molecules on their query atoms

Just superimpose on first substructure

write_c2m_file(file_name)[source]¶

Write a ConQuest to Mercury interchange file.

This file allows substructure search results to be read into the data analysis package of Mercury.

Parameters:: file_name – file to which the data will be written.

add_angle_constraint(name, *args)[source]¶

Add an angle constraint.

Parameters:

name – by which the constraint will be accessed.
*args – three instances either of a pair (substructure_index, atom_index) or of names of geometric objects.
range – as for ccdc.search.SubstructureSearch.add_distance_constraint()

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2))
>>> query.add_centroid('CENT2', (1, 0), (1, 1), (1, 2))
>>> query.add_angle_constraint('ANG1', (0, 0), (1, 1), (1, 0), ('>=', 120))

add_angle_measurement(name, *args)[source]¶

Add an angle measurement.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2))
>>> query.add_centroid('CENT2', (1, 0), (1, 1), (1, 2))
>>> query.add_angle_measurement('ANG1', (0, 0), (1, 1), (1, 0))

add_atom_property_constraint(name, *args, **kw)[source]¶

Add an atom property constraint.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('[*H1]'))
>>> query.add_atom_property_constraint('ATOM1', (0, 0), ('in', [7, 8]), which='AtomicNumber')

add_atom_property_measurement(name, *args, **kw)[source]¶

Add an atom property measurement.

Parameters:

name – the name by which this measurement will be accessed.
*args – a pair, (substructure_index, atom_index) specifying the atom to measure.
which – one of TotalCoordinationNumber, AtomicNumber, VdwRadius, CovalentRadius

>>> query = SubstructureSearch()
>>> substructure = QuerySubstructure()
>>> _ = substructure.add_atom(['C', 'N'])
>>> _ = query.add_substructure(substructure)
>>> query.add_atom_property_measurement('ATOM1', (0, 0), which='AtomicNumber')

add_binary_transform_constraint(name, which, *args)[source]¶

Add a binary arithmetical calculation constraint.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_vector('VEC1', (0, 1), (1, 2))
>>> query.add_vector('VEC2', (0, 2), (1, 1))
>>> query.add_vector_angle_measurement('ANG1', 'VEC1', 'VEC2')
>>> query.add_constant_value_measurement('D2R', 180/3.14159)
>>> query.add_binary_transform_constraint('IN_RADIANS', 'MUL', 'ANG1', 'D2R', (-1, 1))

add_binary_transform_measurement(name, which, arg1, arg2)[source]¶

Add a binary mathematical operation.

Parameters:

name – the name by which this value will be accessed.
which – one of ‘MAX’, ‘MIN’, ‘ADD’, ‘SUBTRACT’, ‘MULTIPLY’, ‘DIVIDE’, ‘POW’, ‘RSIN’, ‘RCOS’.
arg2 (arg1,) – the name of a measurement to be used as arguments to the operator.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_vector('VEC1', (0, 1), (1, 2))
>>> query.add_vector('VEC2', (0, 2), (1, 1))
>>> query.add_vector_angle_measurement('ANG1', 'VEC1', 'VEC2')
>>> query.add_constant_value_measurement('D2R', 180/3.14159)
>>> query.add_binary_transform_measurement('IN_RADIANS', 'MUL', 'ANG1', 'D2R')

add_centroid(name, *args)[source]¶

Adds a centroid to the substructure search.

Parameters:

name – the name by which the centroid will be accessed.
*args – the points or geometric objects from which to define the centroid.

Each arg may be either a pair (substructure_index, atom_index) or the name of a geometric object. There must be at least two such arguments.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2))
>>> query.add_centroid('CENT2', (1, 0), (1, 1), (1, 2))
>>> query.add_centroid('CENT3', 'CENT1', 'CENT2')

add_constant_value_measurement(name, value)[source]¶

Add a constant value.

Parameters:

name – the name by which this constant will be accessed.
value – a float.

>>> query = SubstructureSearch()
>>> substructure = QuerySubstructure()
>>> _ = substructure.add_atom(['C', 'N'])
>>> _ = query.add_substructure(substructure)
>>> query.add_constant_value_measurement('PI', 3.14159)

add_distance_constraint(name, *args, **kw)[source]¶

Add a distance constraint.

param name:

the name of this constraint.

param *args:

specifications of points either as pairs (substructure_index, atom_index) or as names of geometric measurements.

param range:

a condition, either as a pair of floats or a pair (operator, value) where operator may be

‘==’, ‘>’, ‘<’, ‘>=’, ‘<=’, ‘!=’ or a pair (‘in’, list(values)).

param intermolecular:

whether or not the distance should be within a unit cell molecule or between a unit cell molecule and a packing shell molecule.

param vdw_corrected:

whether the distance range should be relative to the Van der Waals radii of the atoms involved.
>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_distance_constraint('DIST1', (0, 1), (1, 1), (-5, 0), vdw_corrected=True, type='any')
>>> query.add_distance_constraint('DIST2', (0, 2), (1, 2), ('<=', 3.0), vdw_corrected=True, type='any')

add_distance_measurement(name, *args)[source]¶

Add a distance measurement.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2))
>>> query.add_centroid('CENT2', (1, 0), (1, 1), (1, 2))
>>> query.add_distance_measurement('DIST1', (0, 0), 'CENT2')

add_dummy_point(name, distance, *args)[source]¶

Creates a dummy point along a vector.

Parameters:

name – the name by which this point will be accessed.
distance – the distance along the vector subtentended by the two points.
*args – two points specified as (substructure_index, atom_index) or the name of another geometric object.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2))
>>> query.add_dummy_point('DUM1', 2.0, 'CENT1', (1, 1))

add_group(name, *args)[source]¶

Creates a group of matched atoms.

Parameters:

name – the name by which this group will be accessed.
*args – pairs, (substructure_index, atom_index) defining the atoms of the group.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_group('GP1', (0, 0), (0, 1), (0, 2))

add_plane(name, *args)[source]¶

Add a plane.

Parameters:

name – the name by which the plane will be accessed.
*args – at least two point specifications in the form (substructure_index, atom_index) or the name of another geometric object.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_plane('PLANE1', (0, 0), (0, 1), (0, 2))
>>> query.add_plane('PLANE2', (1, 0), (1, 1), (1, 2))

add_plane_angle_constraint(name, *args)[source]¶

Add a plane angle constraint.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_plane('PLANE1', (0, 0), (0, 1), (0, 2))
>>> query.add_plane('PLANE2', (1, 0), (1, 1), (1, 2))
>>> query.add_plane_angle_constraint('PA1', 'PLANE1', 'PLANE2', (-10, 10))

add_plane_angle_measurement(name, *args)[source]¶

Add a plane angle measurement.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_plane('PLANE1', (0, 0), (0, 1), (0, 2))
>>> query.add_plane('PLANE2', (1, 0), (1, 1), (1, 2))
>>> query.add_plane_angle_measurement('PA1', 'PLANE1', 'PLANE2')

add_point_plane_distance_constraint(name, *args)[source]¶

Add a point plane distance constraint.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2))
>>> query.add_plane('PLANE2', (1, 0), (1, 1), (1, 2))
>>> query.add_point_plane_distance_constraint('PP1', 'CENT1', 'PLANE2', ('<', 5))

add_point_plane_distance_measurement(name, *args)[source]¶

Add point plane distance measurement.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2))
>>> query.add_plane('PLANE2', (1, 0), (1, 1), (1, 2))
>>> query.add_point_plane_distance_measurement('PP1', 'CENT1', 'PLANE2')

add_substructure(substructure)[source]¶

Add a substructure.

Disconnected substructures may be accepted if the first substructure is contiguous at the start. Multiple substructures may be added as a result.

Parameters:: substructure – ccdc.search.QuerySubstructure.
Returns:: the index of the first substructure added.

add_torsion_angle_constraint(name, *args)[source]¶

Add a torsion angle constraint.

Parameters:

name – the name by which this constraint is accessed.
*args – as for ccdc.search.SubstructureSearch.add_distance_constraint()

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2))
>>> query.add_centroid('CENT2', (1, 0), (1, 1), (1, 2))
>>> query.add_torsion_angle_constraint('ANG1', (0, 0), (0, 1), (1, 1), (1, 0), (120, 180))

add_torsion_angle_measurement(name, *args)[source]¶

Add a torsion angle measurement.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2))
>>> query.add_centroid('CENT2', (1, 0), (1, 1), (1, 2))
>>> query.add_torsion_angle_measurement('ANG1', (0, 0), (0, 1), (1, 1), (1, 0))

add_unary_transform_constraint(name, *args)[source]¶

Add an arithmetical calculation constraint.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_vector('VEC1', (0, 1), (1, 2))
>>> query.add_vector('VEC2', (0, 2), (1, 1))
>>> query.add_vector_angle_measurement('ANG1', 'VEC1', 'VEC2')
>>> query.add_unary_transform_constraint('ABS_ANGLE', 'ABS', 'ANG1', (0, 10))

add_unary_transform_measurement(name, which, arg)[source]¶

Add a mathematical operation.

Parameters:

name – name by which the result will be accessed.
which – one of ‘ABS’, ‘LOG’, ‘LOG10’, ‘EXP’, ‘COS’, ‘SIN’, ‘TAN’, ‘ACOS’, ‘ASIN’, ‘ATAN’, ‘FLOOR’, ‘ROUND’, ‘SQRT’, ‘NEG’.
arg – the name of the measurement or constraint to which to apply the function.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_vector('VEC1', (0, 1), (1, 2))
>>> query.add_vector('VEC2', (0, 2), (1, 1))
>>> query.add_vector_angle_measurement('ANG1', 'VEC1', 'VEC2')
>>> query.add_unary_transform_measurement('ABS_ANGLE', 'ABS', 'ANG1')

add_vector(name, *args)[source]¶

Add a vector.

Parameters:

name – the name by which the vector will be accessed.
*args – two point specifications as (substructure_index, atom_index) or the name of another geometric object.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2))
>>> query.add_vector('VEC1', 'CENT1', (1, 2))

add_vector_angle_constraint(name, *args)[source]¶

Add a vector angle constraint.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_vector('VEC1', (0, 1), (1, 2))
>>> query.add_vector('VEC2', (0, 2), (1, 1))
>>> query.add_vector_angle_constraint('ANG1', 'VEC1', 'VEC2', (0, 60))

add_vector_angle_measurement(name, *args)[source]¶

Add a vector angle measurement.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_vector('VEC1', (0, 1), (1, 2))
>>> query.add_vector('VEC2', (0, 2), (1, 1))
>>> query.add_vector_angle_measurement('ANG1', 'VEC1', 'VEC2')

add_vector_plane_angle_constraint(name, *args)[source]¶

Add a vector plane angle constraint.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_vector('VEC1', (0, 1), (1, 2))
>>> query.add_plane('PLANE2', (1, 0), (1, 1), (1, 2))
>>> query.add_vector_plane_angle_constraint('ANG1', 'VEC1', 'PLANE2', ('>', 90))

add_vector_plane_angle_measurement(name, *args)[source]¶

Add a vector plane angle measurement.

>>> query = SubstructureSearch()
>>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O'))
>>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H'))
>>> query.add_vector('VEC1', (0, 1), (1, 2))
>>> query.add_plane('PLANE2', (1, 0), (1, 1), (1, 2))
>>> query.add_vector_plane_angle_measurement('ANG1', 'VEC1', 'PLANE2')

static from_xml(xml)[source]¶

Create a substructure search from XML. Deprecated.

Parameters:: xml – XML string

static from_xml_file(file_name)[source]¶

Create a substructure search from an XML file. Deprecated.

Parameters:: file_name – path to XML file
Raises:: IOError when the file does not exist

read_xml(xml)[source]¶

Read search query from XML. Deprecated.

Parameters:: xml – XML string

read_xml_file(file_name)[source]¶

Read search parameters from an XML file. Deprecated.

Parameters:: file_name – path to XML file
Raises:: IOError if the file cannot be read

class ccdc.search.ReducedCellSearch(query=None, settings=None)[source]¶

Provide reduced cell searches.

class CrystalQuery(crystal)[source]¶: Reduced cell query from a crystal.

class Query(lengths=None, angles=None, lattice_centring=None)[source]¶: Base query.

class Settings(_settings=None)[source]¶

Settings appropriate to a reduced cell search.

property absolute_angle_tolerance¶: The absolute angle tolerance.

property is_normalised¶: Whether the input cell is normalised.

property percent_length_tolerance¶: The cell length tolerance as a percentage of the longest cell dimension.

reset()[source]¶: Reset to default values.

class XMLFileQuery(file_name)[source]¶: Reduced cell query from a file name.

class XMLQuery(xml)[source]¶: Reduced cell query from an XML representation.

compare_cells(r0, r1)[source]¶

Compare two reduced cells.

Parameters:

r0 – the first reduced cell, an instance of ccdc.crystal.Crystal.ReducedCell
r1 – the second reduced cell similarly

Returns:

boolean

static from_xml(xml)[source]¶

Construct a reduced cell search from an XML representation.

Parameters:: xml – XML string

static from_xml_file(file_name)[source]¶

Construct a reduced cell search from an XML file.

Parameters:: file_name – path to XML file
Raises:: IOError when the file does not exist

read_xml(xml)[source]¶

Read XML into this ReducedCellSearch.

Parameters:: xml – XML string

read_xml_file(file_name)[source]¶

Read an XML file into this ReducedCellSearch.

Parameters:: file_name – path to XML file
Raises:: IOError if the file cannot be read

set_query(query)[source]¶: Set the query.

class ccdc.search.CombinedSearch(expression, settings=None)[source]¶

Boolean combinations of other searches.

TextNumericSearch, SubstructureSearch, SimilaritySearch and ReducedCellSearch can be combined using and, or and not to provide a combined search.

>>> csd = io.EntryReader('csd')
>>> tns = TextNumericSearch()
>>> tns.add_compound_name('Aspirin')
>>> sub_search = SubstructureSearch()
>>> _ = sub_search.add_substructure(SMARTSSubstructure('C(=O)OH'))
>>> rcs = ReducedCellSearch(ReducedCellSearch.CrystalQuery(csd.crystal('ACSALA')))
>>> combi_search = CombinedSearch(tns & (-rcs | -sub_search))
>>> hits = combi_search.search()
>>> print(len(hits))
91

class CombinedHit(identifier, _database=None, _entry=None, _crystal=None, _molecule=None)[source]¶: A hit from a combined search.

class Settings[source]¶: Settings appropriate to a combined search.

static max_hit_structures(other, count)[source]¶

Limit the number of hits found by a combination search.

Parameters:

other – a combination of searches.
count – maximum number of hits to find.

Search API¶

Introduction¶

API¶

Classes for defining substructures¶

Search classes¶

Table of Contents

Previous topic

Next topic