Analysing molecular interactions preferences

Introduction

The CSD Portfolio contains a knowledge-based library of intermolecular interactions. It contains information on the frequencies and directionality of intermolecular contacts is particularly relevant to medicinal chemists interested in identifying bioisosteric replacements and to molecular modellers engaged in structure-based drug design.

Each interaction is represented by a pair of functional groups. The data for each interaction pair in the knowledge-base has been pre-calculated by searching the CSD for non-bonded interactions between a pair of functional groups A and B. The A…B contacts are transformed so that the A groups are least-squared superimposed. The resulting scatterplot shows the experimentally observed distribution of B (the contact group) around A (the central group).

Knowledge based interaction library scatter plot showing oh contact group interacting with an aliphatic ester.

H-bonded O-H contacts around aliphatic ester central group. The majority of these contacts form to the terminal C=O rather than the etheric C-O-C.

Note

For more information on these knowledge-based interaction libraries please see: “Isostar: A library of information about non-bonded interactions” I. J. Bruno, J. C. Cole, J. P. M. Lommerse, R. S. Rowland, R. Taylor and M. L. Verdonk, J., Comput.-Aided Mol. Des., 11, 525-537, 1997 DOI: 10.1023/A:1007934413448.

Central and contact groups and libraries

Because the data have been pre-calculated we have libraries of the central and contact groups that have been used to populate the knowledge-base. Let us get access to these central and contact group libraries.

>>> from ccdc.interaction import InteractionLibrary
>>> central_lib = InteractionLibrary.CentralGroupLibrary()
>>> contact_lib = InteractionLibrary.ContactGroupLibrary()

To illustrate the functionality of a group from such a library let us have a look at the aliphatic ester central and the alcohol hydroxyl contact group.

>>> central_ester = central_lib.group_by_name('aliphatic-aliphatic ester')
>>> contact_oh = contact_lib.group_by_name('alcohol OH')

These groups have name and substructure query attributes. The latter can be used to set up a ccdc.search.SubstructureSearch.

>>> print(central_ester.name)
aliphatic-aliphatic ester
>>> central_ester.substructure_query  
<ccdc.search.QuerySubstructure object at ...>

Accessing the underlying interaction data

To access the underlying interaction data for a pair of functional groups we make use of the ccdc.interaction.InteractionLibrary.CentralGroup.interaction_data() function. This function takes a ccdc.interaction.InteractionLibrary.ContactGroup as an argument.

>>> data = central_ester.interaction_data(contact_oh)

The relationship between the contact and the central group is symmetric so we could equally have called the ccdc.interaction.InteractionLibrary.ContactGroup.interaction_data() function with a ccdc.interaction.InteractionLibrary.rCentralGroup.

>>> data = contact_oh.interaction_data(central_ester)

The data is an instance of the class ccdc.interaction.InteractionLibrary.InteractionData and it has several useful attributes. Let us have a look at some of them.

>>> print(data.central_group_name)
aliphatic-aliphatic ester
>>> print(data.contact_group_name)
alcohol OH
>>> print(data.ncontacts)
3160

The attribute ccdc.interaction.InteractionLibrary.InteractionData.ncontacts is the total number of contacts observed in the interaction data. It is also possible to find out how many contacts are within van der Waals distance.

>>> data.nvdw_contacts
2212

The distance between the contacts in the interaction data are relative to the van der Waals overlap distance. So a contact with a distance of 0.0 would be at van der Waals distance. With this in mind we are in a position to explore data further.

>>> data.min_distance
-1.03
>>> data.max_distance
0.5
>>> data.ncontacts_in_range(-1.03, -.50)
1584

It is possible to plot a histogram of illustrating the frequency of contacts in 0.01 Angstroms bins.

>>> data.histogram  
[1, 2, 0, 6, 1, ..., 23, 22, 20, 25, 11]

To get a list of pairs of x, y values for this histogram we can use the minimum and maximum distances.

>>> width = data.max_distance - data.min_distance
>>> bin_size = float(width) / len(data.histogram)
>>> xs = [ data.min_distance + i*bin_size
...        for i in range(len(data.histogram)) ]
>>> for x, y in zip(xs, data.histogram):  
...     print('Distance %8.2f  contacts %3i' % (x, y))
...
Distance    -1.03  contacts   1
Distance    -1.02  contacts   2
Distance    -1.01  contacts   0
Distance    -1.00  contacts   6
Distance    -0.99  contacts   1
...
Distance     0.45  contacts  23
Distance     0.46  contacts  22
Distance     0.47  contacts  20
Distance     0.48  contacts  25
Distance     0.49  contacts  11

The interaction data also contains a relative density value and an associated estimated standard deviation. The relative density is defined as drel = dshort/dlong, where dshort is the density of contacts within the sum of van der Waals radii, V, and dlong is the density of contacts between V and V+Tol. The larger drel is, the greater the tendency for the groups to form short interactions. Typical values for drel are 2-8 for a strong hydrogen bond (e.g., an H-bond where one or both of the groups is/are charged), 1-2 for an average hydrogen bond, 0.7-1 for a weak H-bond, and 0.3-0.7 for a very weak attractive interaction. An estimated standard deviation for drel is given in brackets, assuming Poisson statistics.

>>> print('Relative density: %.2f (%.2f)' % data.relative_density)
Relative density: 2.33 (0.09)

Matching functional groups to molecules

Suppose that we wanted to find out if a central and/or contact group was present in a molecule of Aspirin.

>>> from ccdc.io import EntryReader
>>> with EntryReader('CSD') as reader:
...     aspirin_entry = reader.entry('ACSALA')
...
>>> print(aspirin_entry.chemical_name)
2-acetoxybenzoic acid
>>> central_ester.search_molecule(aspirin_entry.molecule)
[]

In this case we do not get a hit using our aliphatic-aliphatic ester as the ester moiety in Aspirin is connected to an aromatic ring. However, if we use an aliphatic-aromatic ester we do get a hit.

>>> al_ar_ester = central_lib.group_by_name('aromatic-aliphatic ester')
>>> al_ar_ester.search_molecule(aspirin_entry.molecule)  
[<ccdc.interaction.InteractionLibrary.FunctionalGroupHit object at ...>]

The ccdc.interaction.InteractionLibrary.FunctionalGroupHit object can be used to find the matching atoms in the input molecule and to identify the group that matched.

>>> hit = al_ar_ester.search_molecule(aspirin_entry.molecule)[0]
>>> print(hit.name)
aromatic-aliphatic ester
>>> print(hit.group)  
<ccdc.interaction.InteractionLibrary.CentralGroup object at ...>
>>> print(hit.match_atoms())
[Atom(C1), Atom(C3), Atom(C2), Atom(C9), Atom(C8), Atom(O3), Atom(O4)]
>>> print(hit.match_atoms(indices=True))
(0, 2, 1, 8, 7, 19, 20)

If we wanted to find all central/contact groups in a molecule we can use the ccdc.interaction.CentralGroupLibrary.search_molecule() and ccdc.interaction.ContactGroupLibrary.search_molecule() functions respectively.

>>> hits = contact_lib.search_molecule(aspirin_entry.molecule)
>>> len(hits)
38