SMARTS implementation

SMARTS is a language that allows you to specify substructures using rules that are extensions of SMILES (Simplified Molecular Input Line Entry Specification). The CSD Python API implementation of SMARTS is a subset of the full SMARTS functionality. The following should be taken into consideration when using the CSD Python API’s implementation of SMARTS:

  • Unsupported features (General):
    • Dot for “not necessarily connected” fragments or atoms, e.g. C.C
    • Recursive SMARTS, e.g. [$(CC);$(CCC)]
    • Reaction SMARTS, e.g. [CC>>CC]
  • Unsupported features (Atom properties):
    • Some atom constraints (where n is an integer):
      • h<n>: implicit hydrogens
      • R<n>: ring membership
      • <n>: atomic mass
    • Stereochemical descriptors
    • Constraints of different types combined with OR operator, e.g. [#7X1,#7D2]
    • High precedence AND in OR subexpression, e.g. [C,N&H1]
  • Unsupported features (Bond properties):
    • Stereochemical descriptors for double bonds: these are treated as single bonds with unspecified stereochemistry
    • High-precedence AND in OR subexpression, e.g. =&@,- (cyclic double or single and unspecified cyclicity)
    • The following constructs are not supported:
      • NOT any bond, e.g. !~
      • different bond types combined with AND operator, e.g. -&= (single and double)
      • different NOT bond types combined with OR operator, e.g. !-,!= (not single or not double)

Matching to aromatic and aliphatic atoms in the CSD will correspond to the representations curated in the CSD rather than the canonical representations defined by the SMILES specification. There is a small extension to Daylight SMARTS to allow quadruple, delocalised and pi bonds to be represented using the characters ‘_’, ‘"’ and ‘|’ respectively.