Analysing molecular interactions preferences¶
Introduction¶
The CSD Portfolio contains a knowledge-based library of intermolecular interactions. It contains information on the frequencies and directionality of intermolecular contacts is particularly relevant to medicinal chemists interested in identifying bioisosteric replacements and to molecular modellers engaged in structure-based drug design.
Each interaction is represented by a pair of functional groups. The data for each interaction pair in the knowledge-base has been pre-calculated by searching the CSD for non-bonded interactions between a pair of functional groups A and B. The A…B contacts are transformed so that the A groups are least-squared superimposed. The resulting scatterplot shows the experimentally observed distribution of B (the contact group) around A (the central group).
Note
For more information on these knowledge-based interaction libraries please see: “Isostar: A library of information about non-bonded interactions” I. J. Bruno, J. C. Cole, J. P. M. Lommerse, R. S. Rowland, R. Taylor and M. L. Verdonk, J., Comput.-Aided Mol. Des., 11, 525-537, 1997 DOI: 10.1023/A:1007934413448.
Central and contact groups and libraries¶
Because the data have been pre-calculated we have libraries of the central and contact groups that have been used to populate the knowledge-base. Let us get access to these central and contact group libraries.
>>> from ccdc.interaction import InteractionLibrary
>>> central_lib = InteractionLibrary.CentralGroupLibrary()
>>> contact_lib = InteractionLibrary.ContactGroupLibrary()
To illustrate the functionality of a group from such a library let us have a look at the aliphatic ester central and the alcohol hydroxyl contact group.
>>> central_ester = central_lib.group_by_name('aliphatic-aliphatic ester')
>>> contact_oh = contact_lib.group_by_name('alcohol OH')
These groups have name and substructure query attributes. The latter
can be used to set up a ccdc.search.SubstructureSearch
.
>>> print(central_ester.name)
aliphatic-aliphatic ester
>>> central_ester.substructure_query
<ccdc.search.QuerySubstructure object at ...>
Accessing the underlying interaction data¶
To access the underlying interaction data for a pair of functional
groups we make use of the
ccdc.interaction.InteractionLibrary.CentralGroup.interaction_data()
function. This
function takes a ccdc.interaction.InteractionLibrary.ContactGroup
as an
argument.
>>> data = central_ester.interaction_data(contact_oh)
The relationship between the contact and the central group is symmetric
so we could equally have called the
ccdc.interaction.InteractionLibrary.ContactGroup.interaction_data()
function with a
ccdc.interaction.InteractionLibrary.rCentralGroup
.
>>> data = contact_oh.interaction_data(central_ester)
The data is an instance of the class ccdc.interaction.InteractionLibrary.InteractionData
and it has several useful attributes. Let us have a look at some of
them.
>>> print(data.central_group_name)
aliphatic-aliphatic ester
>>> print(data.contact_group_name)
alcohol OH
>>> print(data.ncontacts)
3154
The attribute ccdc.interaction.InteractionLibrary.InteractionData.ncontacts
is the total
number of contacts observed in the interaction data. It is also possible
to find out how many contacts are within van der Waals distance.
>>> data.nvdw_contacts
2210
The distance between the contacts in the interaction data are relative to the van der Waals overlap distance. So a contact with a distance of 0.0 would be at van der Waals distance. With this in mind we are in a position to explore data further.
>>> data.min_distance
-1.03
>>> data.max_distance
0.5
>>> data.ncontacts_in_range(-1.03, -.50)
1582
It is possible to plot a histogram of illustrating the frequency of contacts in 0.01 Angstroms bins.
>>> data.histogram
[1, 2, 0, ..., 11]
To get a list of pairs of x, y values for this histogram we can use the minimum and maximum distances.
>>> width = data.max_distance - data.min_distance
>>> bin_size = float(width) / len(data.histogram)
>>> xs = [ data.min_distance + i*bin_size
... for i in range(len(data.histogram)) ]
>>> for x, y in zip(xs, data.histogram):
... print('Distance %8.2f contacts %3i' % (x, y))
...
Distance -1.03 contacts 1
...
Distance 0.49 contacts 11
The interaction data also contains a relative density value and an associated estimated standard deviation. The relative density is defined as drel = dshort/dlong, where dshort is the density of contacts within the sum of van der Waals radii, V, and dlong is the density of contacts between V and V+Tol. The larger drel is, the greater the tendency for the groups to form short interactions. Typical values for drel are 2-8 for a strong hydrogen bond (e.g., an H-bond where one or both of the groups is/are charged), 1-2 for an average hydrogen bond, 0.7-1 for a weak H-bond, and 0.3-0.7 for a very weak attractive interaction. An estimated standard deviation for drel is given in brackets, assuming Poisson statistics.
>>> print('Relative density: %.2f (%.2f)' % data.relative_density)
Relative density: 2.34 (0.09)
Matching functional groups to molecules¶
Suppose that we wanted to find out if a central and/or contact group was present in a molecule of Aspirin.
>>> from ccdc.io import EntryReader
>>> with EntryReader('CSD') as reader:
... aspirin_entry = reader.entry('ACSALA')
...
>>> print(aspirin_entry.chemical_name)
2-acetoxybenzoic acid
>>> central_ester.search_molecule(aspirin_entry.molecule)
[]
In this case we do not get a hit using our aliphatic-aliphatic ester as the ester moiety in Aspirin is connected to an aromatic ring. However, if we use an aliphatic-aromatic ester we do get a hit.
>>> al_ar_ester = central_lib.group_by_name('aromatic-aliphatic ester')
>>> al_ar_ester.search_molecule(aspirin_entry.molecule)
[<ccdc.interaction.InteractionLibrary.FunctionalGroupHit object at ...>]
The ccdc.interaction.InteractionLibrary.FunctionalGroupHit
object can be used to find the
matching atoms in the input molecule and to identify the group that
matched.
>>> hit = al_ar_ester.search_molecule(aspirin_entry.molecule)[0]
>>> print(hit.name)
aromatic-aliphatic ester
>>> print(hit.group)
<ccdc.interaction.InteractionLibrary.CentralGroup object at ...>
>>> print(hit.match_atoms())
[Atom(C1), Atom(C3), Atom(C2), Atom(C9), Atom(C8), Atom(O3), Atom(O4)]
>>> print(hit.match_atoms(indices=True))
(0, 2, 1, 8, 7, 19, 20)
If we wanted to find all central/contact groups in a molecule we can use
the ccdc.interaction.CentralGroupLibrary.search_molecule()
and
ccdc.interaction.ContactGroupLibrary.search_molecule()
functions
respectively.
>>> hits = contact_lib.search_molecule(aspirin_entry.molecule)
>>> len(hits)
38