Search API¶
Introduction¶
The ccdc.search
module provides various search classes.
The main classes of the ccdc.search
module are:
ccdc.search.TextNumericSearch
ccdc.search.SubstructureSearch
ccdc.search.SimilaritySearch
ccdc.search.ReducedCellSearch
ccdc.search.CombinedSearch
These all inherit from the base class ccdc.search.Search
. The base
ccdc.search.Search
contains nested classes defining basic search hits
and settings:
The base class ccdc.search.Search
also contains the
ccdc.search.Search.search()
function which is used to search the CSD.
All the searches except ccdc.search.TextNumericSearch
also support
searching of the following additional data sources:
- a Python list of identifiers
- a molecule file path
- a
ccdc.io
reader - an individual
ccdc.molecule.Molecule
- an individual
ccdc.crystal.Crystal
- a list of molecules, crystals or entries
The ccdc.search.TextNumericSearch
can only sensibly be applied to
the CSD.
The ccdc.search.Search.search()
returns a list of
ccdc.search.Search.SearchHit
instances. Some of the searches make use
of more specific search hit classes, namely:
ccdc.search.TextNumericSearch.TextNumericHit
ccdc.search.SubstructureSearch.SubstructureHit
ccdc.search.SimilaritySearch.SimilarityHit
Most of the searches return simple Python lists of search hits. However,
a search carried out using a ccdc.search.SubstructureSearch
returns a
ccdc.search.SubstructureSearch.SubstructureHitList
, which contains a
ccdc.search.SubstructureSearch.SubstructureHitList.superimpose()
function for superimposing
all the hits on the first instance in the list.
To illustrate some of the searches let us first get an aspirin molecule.
>>> from ccdc.io import EntryReader
>>> csd_reader = EntryReader('CSD')
>>> mol = csd_reader.molecule('ACSALA')
Text numeric searching.
>>> from ccdc.search import TextNumericSearch
>>> text_numeric_search = TextNumericSearch()
>>> text_numeric_search.add_compound_name('aspirin')
>>> hits = text_numeric_search.search()
>>> len(hits)
85
Substructure searching.
>>> from ccdc.search import MoleculeSubstructure, SubstructureSearch
>>> substructure = MoleculeSubstructure(mol)
>>> substructure_search = SubstructureSearch()
>>> _ = substructure_search.add_substructure(substructure)
>>> hits = substructure_search.search()
>>> len(hits)
51
Similarity searching.
>>> from ccdc.search import SimilaritySearch
>>> similarity_search = SimilaritySearch(mol)
>>> hits = similarity_search.search()
>>> len(hits)
93
Reduced cell searching.
>>> from ccdc.search import ReducedCellSearch
>>> crystal = csd_reader.crystal('ACSALA')
>>> query = ReducedCellSearch.CrystalQuery(crystal)
>>> reduced_cell_searcher = ReducedCellSearch(query)
>>> hits = reduced_cell_searcher.search()
>>> len(hits)
11
Combined searches.
>>> from ccdc.search import CombinedSearch
>>> combined_search = CombinedSearch(similarity_search & -text_numeric_search)
>>> hits = combined_search.search()
>>> len(hits)
30
See also
The descriptive documentation for the general philosophy of searching, substructure searching, similarity searching, text numeric searching, and reduced cell searching. combined searches.
API¶
Classes for defining substructures¶
-
class
ccdc.search.
QueryAtom
(atomic_symbol='', _substructure_atom=None)[source]¶ Atom used to define a substructure search.
A QueryAtom can be used to represent a single atom type or a set of atom types. A QueryAtom can also have additional constraints imposed on it, for example that it should be aromatic.
Let us create a query atom representing an oxygen atom.
>>> query_atom = QueryAtom('O') >>> print(query_atom) QueryAtom(O)
Suppose that we wanted the query atom to be either a carbon or a nitrogen atom.
>>> query_atom = QueryAtom(['C', 'N']) >>> print(query_atom) QueryAtom(C, N)
It is possible to add further constraints on a QueryAtom. For, example, we can insist that it should be aromatic.
>>> query_atom.aromatic = True >>> print(query_atom.aromatic) AtomAromaticConstraint: 1 >>> print(query_atom) QueryAtom(C, N)[atom aromaticity: equal to 1]
See Query Atoms for further details.
-
acceptor
¶ Constraint specifying whether or not the QueryAtom is an acceptor.
>>> a = QueryAtom(['C', 'N']) >>> a.acceptor = True >>> print(a) QueryAtom(C, N)[AtomAcceptorTypeConstraint]
-
add_connected_element_count
(atomic_symbols, count)[source]¶ Set the number of connected elements constraint.
Constraint to define the number of times the QueryAtom should be connected to atoms with elements defined in the atomic_symbols list.
Parameters: - atomic_symbols – atomic symbol or list of atomic symbols.
- count – see Constraint conditions for details.
>>> a = QueryAtom(['C', 'N']) >>> a.add_connected_element_count(['F', 'Cl'], 2) >>> print(a) QueryAtom(C, N)[elements included: 0: F 1: Cl , equal to 2]
-
add_protein_atom_type_constraint
(*types)[source]¶ Add a constraint that an atom be in one of the protein atom types.
This is of use only when searching a protein structure.
Parameters: *types – one or more of ‘AMINO_ACID’, ‘LIGAND’, ‘COFACTOR’, ‘WATER’, ‘METAL’, ‘NUCLEOTIDE’, ‘UNKNOWN’. Any case-insensitive, unique prefix may be used. >>> a = QueryAtom('Zn') >>> a.add_protein_atom_type_constraint('Ligand', 'Metal') >>> print(a) QueryAtom(Zn)[protein substructure type : one of 1, 3]
-
aromatic
¶ Constraint specifying whether or not the QueryAtom is aromatic.
>>> a = QueryAtom(['C', 'N']) >>> a.aromatic = True >>> print(a) QueryAtom(C, N)[atom aromaticity: equal to 1]
-
cyclic
¶ Constraint specifying whether or not the QueryAtom is part of a cycle.
>>> a = QueryAtom(['C', 'N']) >>> a.cyclic = True >>> print(a) QueryAtom(C, N)[atom cyclicity: equal to 1]
-
cyclic_bonds
¶ Constraint specifying the number of cyclic bonds of the QueryAtom.
>>> a = QueryAtom(['C', 'N']) >>> a.cyclic_bonds = ('!=', 4) >>> print(a) QueryAtom(C, N)[number of cyclic bonds:not equal to 4]
-
donor
¶ Constraint specifying whether or not the QueryAtom is a donor.
>>> a = QueryAtom(['C', 'N']) >>> a.donor = True >>> print(a) QueryAtom(C, N)[AtomDonorTypeConstraint]
-
formal_charge
¶ Constraint specifying the formal charge on the QueryAtom.
>>> a = QueryAtom(['C', 'N']) >>> a.formal_charge = ('in', [-1, 1]) >>> print(a) QueryAtom(C, N)[charge: one of -1, 1]
-
formal_valency
¶ Constraint specifying the formal valency of the QueryAtom.
>>> a = QueryAtom(['C', 'N']) >>> a.formal_valency = ('>', 3) >>> print(a) QueryAtom(C, N)[atom valency: greater than 3]
-
has_3d_coordinates
¶ Constraint specifying that the atom has 3d coordinates.
>>> a = QueryAtom(['C', 'N']) >>> a.has_3d_coordinates = True >>> print(a) QueryAtom(C, N)[atom must have 3D site]
-
index
¶ Index of this atom in a substructure.
>>> atom = QueryAtom(['C', 'N']) >>> print(atom.index) None >>> substructure = QuerySubstructure() >>> _ = substructure.add_atom(atom) >>> print(atom.index) 0
-
label_match
¶ Constraint specifying that the atom label must match a regular expression.
>>> a = QueryAtom(['C']) >>> a.label_match = '^C12$' >>> print(a) QueryAtom(C)[atom label must match regular expression with pattern: ^C12$]
-
nimplicit_hydrogens
¶ Constraint specifying a count of implicit hydrogens.
>>> a = QueryAtom(['C', 'N']) >>> a.nimplicit_hydrogens = 0 >>> print(a) QueryAtom(C, N)[implicit hydrogen count: equal to 0]
-
num_bonds
¶ Constraint specifying the number of bonds the QueryAtom may have.
>>> a = QueryAtom(['C', 'N']) >>> a.num_bonds = ('<=', 3) >>> print(a) QueryAtom(C, N)[number of connected atoms: less than or equal to 3]
-
num_hydrogens
¶ Constraint specifying the number of hydrogens the QueryAtom may have.
>>> a = QueryAtom(['C', 'N']) >>> a.num_hydrogens = 1 >>> print(a) QueryAtom(C, N)[hydrogen count, including deuterium: equal to 1]
-
smallest_ring
¶ Constraint specifying the size of the smallest ring the QueryAtom forms part of.
>>> a = QueryAtom(['C', 'N']) >>> a.smallest_ring = (5, 6) >>> print(a) QueryAtom(C, N)[atom cyclicity: in range 5 to 6]
-
unfused_unbridged_ring
¶ Constraint specifying whether or not the QueryAtom is part of an unfused and unbridged ring.
>>> a = QueryAtom(['C', 'N']) >>> a.unfused_unbridged_ring = True >>> print(a) QueryAtom(C, N)[atom unfused/unbridged ring: equal to 1]
-
-
class
ccdc.search.
QueryBond
(bond_type=None, _substructure_bond=None)[source]¶ Bond used to define a substructure search.
A QueryBond can be used to represent a single bond type or a set of bond types. A QueryBond can also have additional constraints imposed on it, for example that it should be cyclic.
Let us create a QueryBond that will match any bond type.
>>> query_bond = QueryBond() >>> print(query_bond) # doctest: +NORMALIZE_WHITESPACE QueryBond(Unknown, Single, Double, Triple, Quadruple, Aromatic, Delocalised, Pi)
To create a more specific QueryBond we need to specify some bond types.
>>> from ccdc.molecule import Bond >>> single_bond = Bond.BondType('Single') >>> double_bond = Bond.BondType('Double') >>> query_bond = QueryBond(single_bond) >>> print(query_bond) QueryBond(Single) >>> query_bond = QueryBond([single_bond, double_bond]) >>> print(query_bond) # doctest: +NORMALIZE_WHITESPACE QueryBond(Single, Double)
Finally, let us set a constraint for the bond to be cyclic.
>>> query_bond.cyclic = True >>> print(query_bond) QueryBond(Single, Double)[bond cyclicity: equal to 1]
>>> print(query_bond.cyclic) BondCyclicityConstraint: 1
-
atoms
¶ A list of the two QueryAtoms of the bond, if it is in a substructure, or
None
.>>> s = QuerySubstructure() >>> c = s.add_atom(QueryAtom('C')) >>> n = s.add_atom(QueryAtom('N')) >>> b = QueryBond(['Single', 'Double']) >>> _ = s.add_bond(b, c, n) >>> print(b) QueryBond(Single, Double) >>> print('%s, %s' % (b.atoms[0], b.atoms[1])) QueryAtom(C), QueryAtom(N)
-
bond_length
¶ Constraint specifying the length of the bond.
>>> b = QueryBond('Single') >>> c1 = QueryAtom('C') >>> c2 = QueryAtom('C') >>> s = QuerySubstructure() >>> _ = s.add_atom(c1) >>> _ = s.add_atom(c2) >>> _ = s.add_bond(b, c1, c2) >>> b.bond_length = ('>', 1.6) >>> print(b) QueryBond(Single)[bond length: greater than 1.6]
-
bond_polymeric
¶ Constraint specifying whether or not the
QueryBond
is polymeric.>>> b = QueryBond('Single') >>> b.bond_polymeric = True >>> print(b) QueryBond(Single)[bond polymeric: equal to 1]
-
bond_smallest_ring
¶ Constraint specifying the smallest ring the bond should be a part of.
>>> b = QueryBond('Aromatic') >>> b.bond_smallest_ring = 5 >>> print(b) QueryBond(Aromatic)[bond smallest ring: equal to 5]
-
-
class
ccdc.search.
QuerySubstructure
(_substructure=None)[source]¶ Class to define and run substructure searches.
As an example let us set up a QuerySubstructure for a carbonyl (C=O).
>>> from ccdc.molecule import Bond >>> double_bond = Bond.BondType('Double') >>> substructure_query = QuerySubstructure() >>> query_atom1 = substructure_query.add_atom('C') >>> query_atom2 = substructure_query.add_atom('O') >>> query_bond = substructure_query.add_bond(double_bond, query_atom1, query_atom2)
-
add_atom
(atom)[source]¶ Add an atom to the substructure.
Parameters: atom – may be a QueryAtom separately constructed, an atom of a molecule, or an atomic symbol. Returns: QueryAtom
>>> q = QuerySubstructure() >>> a = q.add_atom(QueryAtom(['N', 'O'])) >>> print(a) QueryAtom(N, O)
-
add_bond
(bond, atom1=None, atom2=None)[source]¶ Add a bond to the substructure.
Parameters: - bond – may be a
QueryBond
, accdc.molecule.Bond.BondType
, accdc.molecule.Bond
, a string or an int. - atom1 –
QueryAtom
orNone
for any atom - atom2 –
QueryAtom
orNone
for any atom
Returns: Raises: TypeError if an improper bond argument is supplied
>>> s = QuerySubstructure() >>> c = s.add_atom(QueryAtom('C')) >>> o1 = s.add_atom(QueryAtom('O')) >>> o2 = s.add_atom(QueryAtom('O')) >>> h = s.add_atom(QueryAtom('H')) >>> _ = s.add_bond(QueryBond('Double'), c, o1) >>> _ = s.add_bond(QueryBond('Single'), c, o2) >>> _ = s.add_bond(QueryBond('Single'), o2, h)
- bond – may be a
-
atoms
¶ The query atoms in the substructure.
>>> q = QuerySubstructure() >>> _ = q.add_atom(QueryAtom('C')) >>> _ = q.add_atom(QueryAtom(['O', 'N'])) >>> atoms = q.atoms >>> print('%s, %s' % (atoms[0], atoms[1])) QueryAtom(C), QueryAtom(N, O)
-
bonds
¶ The bonds in the substructure.
>>> s = QuerySubstructure() >>> b = s.add_bond('Single', QueryAtom('C'), QueryAtom('F')) >>> bonds = s.bonds >>> print(bonds[0]) QueryBond(Single)
-
match_atom
(atom, query_atom=None)[source]¶ Whether or not the given atom matches the query_atom in the given context.
Parameters: - atom – a
ccdc.molecule.Atom
instance. - query_atom – a
ccdc.search.QueryAtom
instance orNone
. IfNone
, the first atom of the substructure will be used. This latter case is a fair bit faster.
Returns: bool
>>> s = QuerySubstructure() >>> _ = s.add_bond('Single', QueryAtom('Cl'), QueryAtom('C')) >>> mol = EntryReader('csd').molecule('AABHTZ') >>> s.match_atom(mol.atom('Cl1')) True >>> s.match_atom(mol.atom('C1')) False >>> s.match_atom(mol.atom('C1'), s.atoms[1]) True
- atom – a
-
match_molecule
(molecule)[source]¶ Whether or not the query matches the specified molecule.
Parameters: molecule – a ccdc.molecule.Molecule
instance.Returns: bool >>> s = QuerySubstructure() >>> _ = s.add_bond('Double', QueryAtom('C'), QueryAtom('O')) >>> mol = EntryReader('csd').molecule('AABHTZ') >>> s.match_molecule(mol) True
-
nmatch_molecule
(molecule)[source]¶ Returns number of query matches within the specified molecule.
Parameters: molecule – a ccdc.molecule.Molecule
instance.Returns: integer >>> s = QuerySubstructure() >>> _ = s.add_bond('Single', QueryAtom('Cl'), QueryAtom('C')) >>> mol = EntryReader('csd').molecule('AABHTZ') >>> s.nmatch_molecule(mol) 2
-
-
class
ccdc.search.
SMARTSSubstructure
(smarts)[source]¶ Make a substructure from a SMARTS string.
Let us create a ketone SMARTSSubstructure as an example.
>>> smarts_query = SMARTSSubstructure("[CD4][CD3](=[OD1])[CD4]") >>> print(smarts_query.smarts) [CD4][CD3](=[OD1])[CD4]
There is a minor extension to Daylight SMARTS to allow the representation of quadruple, delocalised and pi bonds, using the characters ‘_’, ‘”’ and ‘|’ respectively.
There is a second minor extension to allow easy access to the indices of the atoms.
>>> query = SMARTSSubstructure("[#6:0]([#7]-H)[#8:1][#6:2]") >>> print(query.label_to_atom_index(0)) 0 >>> print(query.label_to_atom_index(1)) 3
-
label_to_atom_index
(i)[source]¶ Translate a SMARTS label into the appropriate substructure atom index
-
smarts
¶ The SMARTS string.
-
-
class
ccdc.search.
MoleculeSubstructure
(mol)[source]¶ Make a substructure query from an entire molecule.
Can be used to search for exact matches of a molecule. Furthermore if hydrogen atoms have been removed from the molecule used to initialise the MoleculeSubstructure it can be used to find hits that exactly match the heavy atoms.
Parameters: mol – ccdc.molecule.Molecule
Raises: TypeError if the passed in molecule has multiple components since multi-component molecule substructure searches are not supported. The components should be added as separate substructures. >>> mol = EntryReader('csd').molecule('AABHTZ') >>> sub = MoleculeSubstructure(mol)
-
class
ccdc.search.
ConnserSubstructure
(file_name, _conn=None)[source]¶ Read a Conquest query language file.
-
static
from_string
(text)[source]¶ Create a substructure from a textual representation of a Connser file.
-
interaction_library_contact_atoms
()[source]¶ Provide the list of indexes of atoms into the substructure (optionally) defined in the ConnSer query for generating the data in the CCDC interaction library
The list of indexes are into the list of substructure atoms with the associated substructure
see :module:`ccdc.interactions` for more information on the interaction library
-
static
Search classes¶
-
class
ccdc.search.
Search
(settings=None)[source]¶ Common base class for searches
-
class
SearchHit
(identifier, _database=None, _entry=None, _crystal=None, _molecule=None, _binary_database=None)[source]¶ Base class for search hits.
Provides access to molecules, crystals and entries.
-
crystal
¶ The crystal corresponding to a search hit.
-
entry
¶ The entry corresponding to a search hit.
-
identifier
¶ The string identifier of the hit.
-
molecule
¶ The molecule corresponding to a search hit.
-
-
class
Settings
(_settings=None)[source]¶ Base class for search settings.
-
has_3d_coordinates
¶ Constrain hits to have 3d coordinates.
-
max_hit_structures
¶ The number of structures which may be returned from a search.
-
max_r_factor
¶ Constrain the hits to have an R-factor less than this.
The R-factor will be expressed as a percentage.
-
must_have_elements
¶ Elements which must be present in a hit.
The elements will be presented as a list of atomic symbols.
>>> settings = Search.Settings() >>> settings.must_have_elements = ['C', 'N', 'O', 'S'] >>> print(settings.must_have_elements) [C (6), N (7), O (8), S (16)]
-
must_not_have_elements
¶ Elements which must not be present in a hit.
The elements will be presented as a list of symbols.
>>> settings = Search.Settings() >>> settings.must_not_have_elements = ['S', 'P', 'K'] >>> print(settings.must_not_have_elements) [S (16), P (15), K (19)]
-
no_disorder
¶ Constrain hits to have no disorder.
The value will be False (no filtering), ‘Non-hydrogen’ (filter structures with heavy atom disorder) or ‘All’ (filter structures with any disordered atoms).
-
no_errors
¶ Constrain the hits to have no suppressed errors.
-
no_ions
¶ Constrain the hits not to have a residue with a formal charge. The hits may include zwitterions.
-
no_metals
¶ Constrain the hits not to have a metal atom.
-
no_powder
¶ Constrain hits not to be powder studies.
-
not_polymeric
¶ Constrain the hits not to be polymeric structures.
-
only_organic
¶ Constrain hits to be organic compounds.
-
only_organometallic
¶ Constrain hits to be only organometallic compounds.
-
test
(argument)[source]¶ Test that the argument satisfies the requirements of the settings instance.
Parameters: argument – a ccdc.entry.Entry
,ccdc.crystal.Crystal
orccdc.molecule.Molecule
instance.Returns: bool >>> entry = EntryReader('csd').entry('AABHTZ') >>> settings = Search.Settings() >>> settings.test(entry) True >>> settings.only_organometallic = True >>> settings.test(entry) False
-
-
class
-
class
ccdc.search.
SimilaritySearch
(mol=None, threshold=0.7, coefficient='tanimoto', settings=None)[source]¶ Class to define and run similarity searches.
-
class
Settings
(threshold=0.7, coefficient='tanimoto', _settings=None)[source]¶ -
coefficient
¶ This should be either ‘dice’ or ‘tanimoto’, the default.
-
sort_order
¶ The order in which hits will be sorted.
THis should be either ‘alphabetic’ or ‘value’, the default.
-
threshold
¶ The similarity threshold to apply.
This is a value between 0.0 and 1.0.
-
-
class
SimilarityHit
(similarity, identifier, _database=None, _entry=None, _crystal=None, _molecule=None, _binary_database=None)[source]¶ A search hit recording the similarity measure.
The SimilarityHit instance will give access to the identifier of the hit, the value of the similarity to the query molecule, the entry, crystal or molecule of the hit.
-
coefficient
¶ Which coefficient to use when determining similarity.
-
static
from_xml
(xml)[source]¶ Create a SimilaritySearch from an XML representation.
Parameters: xml – XML string
-
static
from_xml_file
(file_name)[source]¶ Create a SimilaritySearch from an XML file.
Parameters: file_name – path to XML file Raises: IOError when the file does not exist
-
molecule
¶ The query molecule.
-
read_xml_file
(file_name)[source]¶ Read an XML file into the similarity searcher.
Parameters: file_name – path to XML file Raises: IOError if the file cannot be read
-
search_molecule
(mol)[source]¶ Search a molecule.
This can be used to determine a similarity coefficient against the given molecule.
Parameters: mol – ccdc.molecule.Molecule
Returns: SimilaritySearch.SimilarityHit
>>> csd = EntryReader('csd') >>> ibuprofen = csd.molecule('HXACAN') >>> searcher = SimilaritySearch(ibuprofen) >>> hit = searcher.search_molecule(csd.molecule('IBPRAC')) >>> print(round(hit.similarity, 3)) 0.161
-
threshold
¶ The similarity threshold to use.
-
class
-
class
ccdc.search.
TextNumericSearch
(settings=None)[source]¶ Class to define and run text/numeric searches in the CSD.
It is possible to add one or more criterion for the query to match.
>>> text_numeric_query = TextNumericSearch() >>> text_numeric_query.add_compound_name('aspirin') >>> text_numeric_query.add_citation(year=[2011, 2013]) >>> for hit in text_numeric_query.search(max_hit_structures=3): ... print(hit.identifier) ... ACSALA19 ACSALA20 ACSALA21
A human-readable representation of the queries may be obtained: >>> print(‘, ‘.join(q for q in text_numeric_query.queries)) Compound name aspirin anywhere , Journal year in range 2011-2013
-
class
TextNumericSearchSettings
(_settings=None)[source]¶ No settings apart from those provided by the base class required.
-
add_all_identifiers
(refcode, mode='anywhere', ignore_non_alpha_num=False)[source]¶ Search for an identifier, including previous identifiers.
>>> from ccdc.search import TextNumericSearch >>> query = TextNumericSearch() >>> query.add_all_identifiers('DABHUJ') >>> hits = query.search() >>> print(hits[0].identifier) ACPRET03 >>> print(hits[0].entry.previous_identifier) DABHUJ
-
add_all_text
(txt, mode='anywhere', ignore_non_alpha_num=False)[source]¶ Search for text anywhere in the entry.
-
add_analogue
(analogue, mode='anywhere', ignore_non_alpha_num=False)[source]¶ Search for an analogue.
Search for an author.
-
add_bioactivity
(activity, mode='anywhere', ignore_non_alpha_num=False)[source]¶ Search for a particular bio-activity.
-
add_ccdc_number
(value)[source]¶ Search for a particular or a range of CCDC deposition numbers.
>>> from ccdc.search import TextNumericSearch >>> searcher = TextNumericSearch() >>> searcher.add_ccdc_number(241370) >>> hits = searcher.search() >>> len(hits) 1 >>> entry = hits[0].entry >>> print('%s %s' % (entry.identifier, entry.ccdc_number)) ABEBUF 241370 >>> searcher.clear() >>> searcher.add_ccdc_number((241368, 241372)) >>> hits = searcher.search() >>> print(len(hits)) 3 >>> for hit in hits: ... print('%s %s' % (hit.identifier, hit.entry.ccdc_number)) ... ABEBUF 241370 BIBZIW 241371 BIMGEK 241372
-
add_citation
(author='', journal='', volume=None, year=None, first_page=None, ignore_non_alpha_num=False, _coden=None)[source]¶ Search for a citation.
-
add_color
(color, mode='anywhere', ignore_non_alpha_num=False)[source]¶ Search for a particular colour.
-
add_compound_name
(compound_name, mode='anywhere', ignore_non_alpha_num=False)[source]¶ Search for a compound name.
The search checks the content both of
ccdc.entry.Entry.chemical_name
andccdc.entry.Entry.synonyms
.To illustrate this let us have a look at the CSD entry
ABABEM
.>>> from ccdc.io import EntryReader >>> entry_reader = EntryReader('CSD') >>> ababem = entry_reader.entry('ABABEM') >>> print(ababem.chemical_name) Tetrahydro[1,3,4]thiadiazolo[3,4-a]pyridazine-1,3-dione >>> print(ababem.synonyms[0]) 8-Thia-1,6-diazabicyclo[4.3.0]nonane-7,9-dione
The text
azabicyclo[4.3.0]nonane
is only found in the synonym. Let us search for it using a compound name search.>>> from ccdc.search import TextNumericSearch >>> query = TextNumericSearch() >>> query.add_compound_name('azabicyclo[4.3.0]nonane') >>> hits = query.search()
Finally let us assert that we have found
ABABEM
.>>> assert(u'ABABEM' in [h.identifier for h in hits])
-
add_disorder
(disorder, mode='anywhere', ignore_non_alpha_num=False)[source]¶ Search for a disorder comment.
-
add_habit
(habit, mode='anywhere', ignore_non_alpha_num=False)[source]¶ Search for a particular habit.
-
add_peptide_sequence
(peptide_sequence, mode='anywhere', ignore_non_alpha_num=False)[source]¶ Search for a peptide sequence.
-
add_phase_transition
(phase_transition, mode='anywhere', ignore_non_alpha_num=False)[source]¶ Search for a phase transition.
-
add_polymorph
(polymorph, mode='anywhere', ignore_non_alpha_num=False)[source]¶ Search for polymorph information.
-
add_source
(source, mode='anywhere', ignore_non_alpha_num=False)[source]¶ Search for a source.
>>> from ccdc.search import TextNumericSearch >>> searcher = TextNumericSearch() >>> searcher.add_source('toad') >>> hits = searcher.search(max_hit_structures=5) >>> for h in hits: ... print('%-8s: %s' % (h.identifier, h.entry.source)) ... CUXYAV : Ch'an Su (dried venom of Chinese toad) EWAWUW : isolated from the eggs of toad Bufo bufo gargarizans EWAXAD : isolated from the eggs of toad Bufo bufo gargarizans FIFDUT : dried venom of Chinese toad Ch'an Su FIFFAB : dried venom of Chinese toad Ch'an Su
-
static
from_xml_file
(file_name)[source]¶ Create a TextNumericSearch from an XML file.
Parameters: file_name – path to XML file Raises: IOError when the file does not exist
-
is_journal_valid
(journal)[source]¶ Check the validity of a specified journal name.
Parameters: journal – str, journal name
-
journals
¶ A dictionary of journal name : ccdc code number.
-
queries
¶ The current set of queries for this search.
>>> tns = TextNumericSearch() >>> tns.add_all_text('ibuprofen') >>> tns.add_author('Haisa') >>> print('; '.join(str(q).strip() for q in tns.queries)) All text ibuprofen anywhere; Author Haisa anywhere
-
class
-
class
ccdc.search.
SubstructureSearch
(settings=None)[source]¶ Query crystal structures for interactions.
-
class
HitProcessor
[source]¶ Override this class to provide your own add_hit() method.
This class allows a search to process hits as they are found by the search class, rather than waiting until all hits are found before allowing access to them, a procedure which may well run out of memory for very general searches.
-
search
(searcher, database=None)[source]¶ Searches the database with the substructure search.
Parameters: - searcher – a
ccdc.search.SubstructureSearch
instance. - database – a
ccdc.io.EntryReader
instance. If not specified the CSD will be searched.
For each hit found,
ccdc.Search.SubstructureSearch.HitProcessor.add_hit()
will be called with accdc.search.SubstructureSearch.SubstructureHit
instance.- searcher – a
-
-
class
Settings
(max_hit_structures=None, max_hits_per_structure=None)[source]¶ Settings appropriate to a substructure search.
-
max_hits_per_structure
¶ Maximum number of hits per structure.
-
-
class
SubstructureHit
(identifier, match=None, search_structure=None, query=None, _database=None, _entry=None, _crystal=None, _molecule=None, _binary_database=None)[source]¶ A hit from a substructure search.
-
centroid_objects
(name)[source]¶ The geometric object names and atoms from which the centroid was defined.
-
constraint_atoms
(name)[source]¶ The atoms from which the constraint was defined.
Parameters: name – the name of the constraint. Returns: a tuple of ccdc.molecule.Atom
instances.The atoms will be returned in an arbitrary order. All atoms involved in defining the constraint will be returned.
-
constraint_objects
(constraint)[source]¶ A tuple of object names and atoms from which the constraint was defined.
-
dummy_point_objects
(name)[source]¶ The geometric object names and atoms from which the dummy point was defined.
-
match_atoms
(indices=False)[source]¶ Return the atoms matched by the substructure.
Parameters: indices – Whether to return atom indices instead of ccdc.molecule.Atom
instancesReturns: list of ccdc.molecule.Atom
instances or atom indicesThe atoms returned will all be in the asymmetric unit, so directly measuring constraints and measurements from these atoms will not give the correct results if a symmetry-generated copy was involved in the match. See
ccdc.search.SubstructureSearch.SubstructureHit.match_symmetry_operators()
for a way to determine if this is the case.
-
match_components
()[source]¶ Return the molecular components matched by the search.
Returns: list of ccdc.molecule.Molecule
-
match_substructures
()[source]¶ Returns each substructure of the hit as a molecule with the bonds and atoms of the hit.
The symmetry operations of the hit will be applied to the molecules, so measurement and constraints will be appropriate to the hit.
Returns: tuple of ccdc.molecule.Molecule
, one for each substructure of the hitwith the bonds and atoms of the hit
-
match_symmetry_operators
()[source]¶ The symmetry operators required to form the match.
Returns: a list of symmetry operators in the order of the matched atoms.
-
measurement_atoms
(name)[source]¶ The atoms involved in a measurement.
Parameters: name – the name of the measurement. Returns: a tuple of ccdc.molecule.Atom
instances.The atoms will be returned in an arbitrary order. All atoms involved in the measurement will be present, so for example a centroid-centroid distance measurement will produce the atoms of both centroids.
-
-
class
SubstructureHitList
[source]¶ List of hits from a
ccdc.search.SubstructureSearch
-
add_angle_constraint
(name, *args)[source]¶ Add an angle constraint.
Parameters: - name – by which the constraint will be accessed.
- *args – three instances either of a pair (substructure_index, atom_index) or of names of geometric objects.
- range – as for
ccdc.search.SubstructureSearch.add_distance_constraint()
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2)) >>> query.add_centroid('CENT2', (1, 0), (1, 1), (1, 2)) >>> query.add_angle_constraint('ANG1', (0, 0), (1, 1), (1, 0), ('>=', 120))
-
add_angle_measurement
(name, *args)[source]¶ Add an angle measurement.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2)) >>> query.add_centroid('CENT2', (1, 0), (1, 1), (1, 2)) >>> query.add_angle_measurement('ANG1', (0, 0), (1, 1), (1, 0))
-
add_atom_property_constraint
(name, *args, **kw)[source]¶ Add an atom property constraint.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('[*H1]')) >>> query.add_atom_property_constraint('ATOM1', (0, 0), ('in', [7, 8]), which='AtomicNumber')
-
add_atom_property_measurement
(name, *args, **kw)[source]¶ Add an atom property measurement.
Parameters: - name – the name by which this measurement will be accessed.
- *args – a pair, (substructure_index, atom_index) specifying the atom to measure.
- which – one of TotalCoordinationNumber, AtomicNumber, VdwRadius, CovalentRadius
>>> query = SubstructureSearch() >>> substructure = QuerySubstructure() >>> _ = substructure.add_atom(['C', 'N']) >>> _ = query.add_substructure(substructure) >>> query.add_atom_property_measurement('ATOM1', (0, 0), which='AtomicNumber')
-
add_binary_transform_constraint
(name, which, *args)[source]¶ Add a binary arithmetical calculation constraint.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_vector('VEC1', (0, 1), (1, 2)) >>> query.add_vector('VEC2', (0, 2), (1, 1)) >>> query.add_vector_angle_measurement('ANG1', 'VEC1', 'VEC2') >>> query.add_constant_value_measurement('D2R', 180/3.14159) >>> query.add_binary_transform_constraint('IN_RADIANS', 'MUL', 'ANG1', 'D2R', (-1, 1))
-
add_binary_transform_measurement
(name, which, arg1, arg2)[source]¶ Add a binary mathematical operation.
Parameters: - name – the name by which this value will be accessed.
- which – one of ‘MAX’, ‘MIN’, ‘ADD’, ‘SUBTRACT’, ‘MULTIPLY’, ‘DIVIDE’, ‘POW’, ‘RSIN’, ‘RCOS’.
- arg2 (arg1,) – the name of a measurement to be used as arguments to the operator.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_vector('VEC1', (0, 1), (1, 2)) >>> query.add_vector('VEC2', (0, 2), (1, 1)) >>> query.add_vector_angle_measurement('ANG1', 'VEC1', 'VEC2') >>> query.add_constant_value_measurement('D2R', 180/3.14159) >>> query.add_binary_transform_measurement('IN_RADIANS', 'MUL', 'ANG1', 'D2R')
-
add_centroid
(name, *args)[source]¶ Adds a centroid to the substructure search.
Parameters: - name – the name by which the centroid will be accessed.
- *args – the points or geometric objects from which to define the centroid.
Each arg may be either a pair (substructure_index, atom_index) or the name of a geometric object. There must be at least two such arguments.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2)) >>> query.add_centroid('CENT2', (1, 0), (1, 1), (1, 2)) >>> query.add_centroid('CENT3', 'CENT1', 'CENT2')
-
add_constant_value_measurement
(name, value)[source]¶ Add a constant value.
Parameters: - name – the name by which this constant will be accessed.
- value – a float.
>>> query = SubstructureSearch() >>> substructure = QuerySubstructure() >>> _ = substructure.add_atom(['C', 'N']) >>> _ = query.add_substructure(substructure) >>> query.add_constant_value_measurement('PI', 3.14159)
-
add_distance_constraint
(name, *args, **kw)[source]¶ Add a distance constraint.
param name: the name of this constraint. param *args: specifications of points either as pairs (substructure_index, atom_index) or as names of geometric measurements. param range: a condition, either as a pair of floats or a pair (operator, value) where operator may be - ‘==’, ‘>’, ‘<’, ‘>=’, ‘<=’, ‘!=’ or a pair (‘in’, list(values)).
param intermolecular: whether or not the distance should be within a unit cell molecule or between a unit cell molecule and a packing shell molecule. param vdw_corrected: whether the distance range should be relative to the Van der Waals radii of the atoms involved. >>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_distance_constraint('DIST1', (0, 1), (1, 1), (-5, 0), vdw_corrected=True, type='any') >>> query.add_distance_constraint('DIST2', (0, 2), (1, 2), ('<=', 3.0), vdw_corrected=True, type='any')
-
add_distance_measurement
(name, *args)[source]¶ Add a distance measurement.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2)) >>> query.add_centroid('CENT2', (1, 0), (1, 1), (1, 2)) >>> query.add_distance_measurement('DIST1', (0, 0), 'CENT2')
-
add_dummy_point
(name, distance, *args)[source]¶ Creates a dummy point along a vector.
Parameters: - name – the name by which this point will be accessed.
- distance – the distance along the vector subtentended by the two points.
- *args – two points specified as (substructure_index, atom_index) or the name of another geometric object.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2)) >>> query.add_dummy_point('DUM1', 2.0, 'CENT1', (1, 1))
-
add_group
(name, *args)[source]¶ Creates a group of matched atoms.
Parameters: - name – the name by which this group will be accessed.
- *args – pairs, (substructure_index, atom_index) defining the atoms of the group.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_group('GP1', (0, 0), (0, 1), (0, 2))
-
add_plane
(name, *args)[source]¶ Add a plane.
Parameters: - name – the name by which the plane will be accessed.
- *args – at least two point specifications in the form (substructure_index, atom_index) or the name of another geometric object.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_plane('PLANE1', (0, 0), (0, 1), (0, 2)) >>> query.add_plane('PLANE2', (1, 0), (1, 1), (1, 2))
-
add_plane_angle_constraint
(name, *args)[source]¶ Add a plane angle constraint.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_plane('PLANE1', (0, 0), (0, 1), (0, 2)) >>> query.add_plane('PLANE2', (1, 0), (1, 1), (1, 2)) >>> query.add_plane_angle_constraint('PA1', 'PLANE1', 'PLANE2', (-10, 10))
-
add_plane_angle_measurement
(name, *args)[source]¶ Add a plane angle measurement.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_plane('PLANE1', (0, 0), (0, 1), (0, 2)) >>> query.add_plane('PLANE2', (1, 0), (1, 1), (1, 2)) >>> query.add_plane_angle_measurement('PA1', 'PLANE1', 'PLANE2')
-
add_point_plane_distance_constraint
(name, *args)[source]¶ Add a point plane distance constraint.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2)) >>> query.add_plane('PLANE2', (1, 0), (1, 1), (1, 2)) >>> query.add_point_plane_distance_constraint('PP1', 'CENT1', 'PLANE2', ('<', 5))
-
add_point_plane_distance_measurement
(name, *args)[source]¶ Add point plane distance measurement.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2)) >>> query.add_plane('PLANE2', (1, 0), (1, 1), (1, 2)) >>> query.add_point_plane_distance_measurement('PP1', 'CENT1', 'PLANE2')
-
add_substructure
(substructure)[source]¶ Add a substructure.
Parameters: substructure – ccdc.search.QuerySubstructure
.Returns: the index of the substructure.
-
add_torsion_angle_constraint
(name, *args)[source]¶ Add a torsion angle constraint.
Parameters: - name – the name by which this constraint is accessed.
- *args – as for
ccdc.search.SubstructureSearch.add_distance_constraint()
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2)) >>> query.add_centroid('CENT2', (1, 0), (1, 1), (1, 2)) >>> query.add_torsion_angle_constraint('ANG1', (0, 0), (0, 1), (1, 1), (1, 0), (120, 180))
-
add_torsion_angle_measurement
(name, *args)[source]¶ Add a torsion angle measurement.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2)) >>> query.add_centroid('CENT2', (1, 0), (1, 1), (1, 2)) >>> query.add_torsion_angle_measurement('ANG1', (0, 0), (0, 1), (1, 1), (1, 0))
-
add_unary_transform_constraint
(name, *args)[source]¶ Add an arithmetical calculation constraint.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_vector('VEC1', (0, 1), (1, 2)) >>> query.add_vector('VEC2', (0, 2), (1, 1)) >>> query.add_vector_angle_measurement('ANG1', 'VEC1', 'VEC2') >>> query.add_unary_transform_constraint('ABS_ANGLE', 'ABS', 'ANG1', (0, 10))
-
add_unary_transform_measurement
(name, which, arg)[source]¶ Add a mathematical operation.
Parameters: - name – name by which the result will be accessed.
- which – one of ‘ABS’, ‘LOG’, ‘LOG10’, ‘EXP’, ‘COS’, ‘SIN’, ‘TAN’, ‘ACOS’, ‘ASIN’, ‘ATAN’, ‘FLOOR’, ‘ROUND’, ‘SQRT’, ‘NEG’.
- arg – the name of the measurement or constraint to which to apply the function.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_vector('VEC1', (0, 1), (1, 2)) >>> query.add_vector('VEC2', (0, 2), (1, 1)) >>> query.add_vector_angle_measurement('ANG1', 'VEC1', 'VEC2') >>> query.add_unary_transform_measurement('ABS_ANGLE', 'ABS', 'ANG1')
-
add_vector
(name, *args)[source]¶ Add a vector.
Parameters: - name – the name by which the vector will be accessed.
- *args – two point specifications as (substructure_index, atom_index) or the name of another geometric object.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_centroid('CENT1', (0, 0), (0, 1), (0, 2)) >>> query.add_vector('VEC1', 'CENT1', (1, 2))
-
add_vector_angle_constraint
(name, *args)[source]¶ Add a vector angle constraint.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_vector('VEC1', (0, 1), (1, 2)) >>> query.add_vector('VEC2', (0, 2), (1, 1)) >>> query.add_vector_angle_constraint('ANG1', 'VEC1', 'VEC2', (0, 60))
-
add_vector_angle_measurement
(name, *args)[source]¶ Add a vector angle measurement.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_vector('VEC1', (0, 1), (1, 2)) >>> query.add_vector('VEC2', (0, 2), (1, 1)) >>> query.add_vector_angle_measurement('ANG1', 'VEC1', 'VEC2')
-
add_vector_plane_angle_constraint
(name, *args)[source]¶ Add a vector plane angle constraint.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_vector('VEC1', (0, 1), (1, 2)) >>> query.add_plane('PLANE2', (1, 0), (1, 1), (1, 2)) >>> query.add_vector_plane_angle_constraint('ANG1', 'VEC1', 'PLANE2', ('>', 90))
-
add_vector_plane_angle_measurement
(name, *args)[source]¶ Add a vector plane angle measurement.
>>> query = SubstructureSearch() >>> _ = query.add_substructure(SMARTSSubstructure('C(=O)O')) >>> _ = query.add_substructure(SMARTSSubstructure('N(-H)H')) >>> query.add_vector('VEC1', (0, 1), (1, 2)) >>> query.add_plane('PLANE2', (1, 0), (1, 1), (1, 2)) >>> query.add_vector_plane_angle_measurement('ANG1', 'VEC1', 'PLANE2')
-
class
-
class
ccdc.search.
ReducedCellSearch
(query=None, settings=None)[source]¶ Provide reduced cell searches.
-
class
Settings
(_settings=None)[source]¶ Settings appropriate to a reduced cell search.
-
absolute_angle_tolerance
¶ The absolute angle tolerance.
-
is_normalised
¶ Whether the input cell is normalised.
-
percent_length_tolerance
¶ The cell length tolerance as a percentage of the longest cell dimension.
-
-
compare_cells
(r0, r1)[source]¶ Compare two reduced cells.
Parameters: - r0 – the first reduced cell, an instance of
ccdc.crystal.Crystal.ReducedCell
- r1 – the second reduced cell similarly
Returns: boolean
- r0 – the first reduced cell, an instance of
-
static
from_xml
(xml)[source]¶ Construct a reduced cell search from an XML representation.
Parameters: xml – XML string
-
static
from_xml_file
(file_name)[source]¶ Construct a reduced cell search from an XML file.
Parameters: file_name – path to XML file Raises: IOError when the file does not exist
-
class
-
class
ccdc.search.
CombinedSearch
(expression, settings=None)[source]¶ Boolean combinations of other searches.
TextNumericSearch, SubstructureSearch, SimilaritySearch and ReducedCellSearch can be combined using and, or and not to provide a combined search.
>>> csd = io.EntryReader('csd') >>> tns = TextNumericSearch() >>> tns.add_compound_name('Aspirin') >>> sub_search = SubstructureSearch() >>> _ = sub_search.add_substructure(SMARTSSubstructure('C(=O)OH')) >>> rcs = ReducedCellSearch(ReducedCellSearch.CrystalQuery(csd.crystal('ACSALA'))) >>> combi_search = CombinedSearch(tns & (-rcs | -sub_search)) >>> hits = combi_search.search() >>> print(len(hits)) 78