Graph Sets

Graph set analysis is described in “Etter, M. C.; MacDonald, J. C. & Bernstein, J. Graph-set analysis of hydrogen-bond patterns in organic crystals Acta. Cryst. B, 1990, 46, 256-262” DOI: 10.1107/S0108768189012929.

Graph set analysis can be used to characterise hydrogen-bonding motifs and networks in organic crystals, providing a succinct and unique way of defining the hydrogen-bonding patterns found in a particular crystal. Graph theory is used to determine the topology of the hydrogen-bonding interactions, considering molecules as nodes, with hydrogen bonds as the lines that connect them. The analysis characterises hydrogen bonds in terms of whether they are single discrete hydrogen bonds, form chains or rings and also provides information on the number of donors, acceptors, size of the chains and rings etc.

This can be used to compare and contrast crystal structures where the basic hydrogen bonds formed are the same but the 3-D network they form is substantially different. A simple example is distinguishing between catamer and dimer hydrogen-bonding networks in carboxylic acids.

We will give an example of this using two polymorphic structures in the CSD.

Firstly let us import the appropriate modules and read the two crystal structures:

>>> from ccdc.descriptors import CrystalDescriptors
>>> GraphSetSearch = CrystalDescriptors.GraphSetSearch
>>> from ccdc.io import EntryReader
>>> csd = EntryReader('csd')
>>> tetrol = csd.crystal('TETROL')
>>> tetrol01 = csd.crystal('TETROL01')

The graph set analyser can be used immediately to analyse the crystals, though there is a variety of settings to control more exactly the nature of the hydrogen bonds and the search parameters, the default settings provide a realistic analysis in most cases.

>>> GSA = GraphSetSearch()
>>> tetrol_graph_sets = GSA.search(tetrol)
>>> tetrol01_graph_sets = GSA.search(tetrol01)

In both cases there is a single graph set found:

>>> print('TETROL: %d, TETROL01: %d' % (len(tetrol_graph_sets), len(tetrol01_graph_sets)))
TETROL: 1, TETROL01: 1

The string representation of a graph set shows the classification of the graph set, specifically the description - Ring, Chain, Self or Discrete - the number of donors, the number of acceptors and the degree, i.e., the path length of the repeat unit. The final component of this description is the edge labels.

>>> print(tetrol_graph_sets[0])
R2,2(8) >a>a
>>> print(tetrol01_graph_sets[0])
C1,1(4) a

It is immediately apparent that these represent very different polymorphs. TETROL forms a ring with a carboxylic acid dimer, whereas TETROL01 forms a chain with a carboxylic acid catamer.

The ccdc.descriptors.CrystalDescriptors.GraphSetSearch.GraphSet instances allow more detailed inspection of the network of hydrogen bonds:

>>> gs = tetrol_graph_sets[0]
>>> print(gs.descriptor)
ring
>>> print(gs.ndonors)
2
>>> print(gs.nacceptors)
2
>>> print(gs.nmolecules)
2
>>> print(gs.degree)
8
>>> print(gs.edge_labels)
>a>a
>>> print(gs.period)
1

The hbonds attribute will return a tuple of ccdc.crystal.Crystal.HBond, allowing access to the symmetry operator of the molecules involved in the hbond.

>>> hbonds = gs.hbonds
>>> print(len(hbonds))
2
>>> hb = hbonds[0]
>>> print(hb)
HBond(Atom(O1)-Atom(H)-Atom(O2))

The hydrogen of this bond is not present in the crystal, but has been inferred by the graph set analyser. In consequence the atom has no coordinates, as is reflected in attributes of the hbond:

>>> print(hb.length)
None
>>> print(round(hb.da_distance, 2))
2.65
>>> print(', '.join("'%s'" % s for s in hb.symmetry_operators))
'x,y,z', '', '-x,-y,-z'

Note that the hydrogen has no symmetry operator associated with it.

In common with the rest of the API, the GraphSetSearch has a nested settings class, ccdc.descriptors.CrystalDescriptors.GraphSetSearch.Settings allowing configuration of the search. For example we can constrain the search to look only for intramolecular hydrogen bonds:

>>> crystal = csd.crystal('AAGAGG10')
>>> settings = GSA.settings
>>> settings.distance_range = (0, 3.2)
>>> settings.vdw_corrected = False
>>> gsets = GSA.search(crystal)
>>> print(len(gsets))
42
>>> settings.intermolecular = 'intra'
>>> gsets = GSA.search(crystal)
>>> print(len(gsets))
2
>>> print(gsets[0])
S1,1(10) a
>>> settings.intermolecular = 'inter'
>>> gsets = GSA.search(crystal)
>>> print(len(gsets))
40

We could also allow a narrower range of contact distances, or, if hydrogens are required, a tighter angle:

>>> crystal = csd.crystal('AADAMC')
>>> GSA = CrystalDescriptors.GraphSetSearch()
>>> default_gsets = GSA.search(crystal)
>>> GSA.settings.intermolecular = 'any'
>>> GSA.settings.require_hydrogens = False
>>> GSA.settings.angle_tolerance = 145.
>>> GSA.settings.distance_range = (2.0, 3.5)
>>> GSA.settings.vdw_corrected = False
>>> loose_gsets = GSA.search(crystal)
>>> print('Default: %d, Loose: %d' % (len(default_gsets), len(loose_gsets)))
Default: 6, Loose: 10