Interaction Maps

Note

the ccdc.interaction.InteractionMapAnalysis is available only to CSD-Discovery, CSD-Materials and CSD-Enterprise users.

Introduction

The ccdc.interaction.InteractionMapAnalysis uses crystallographic information about non-bonded interactions to generate interaction maps around small molecules or within protein binding site. Depending on the settings used, the calculated maps provide the interaction preferences for your molecule as a whole in the context of the crystal structure, or the interaction preferences for all or selected cavities in a protein.

Note

For more information on the details of the fundamental methodology please see:

“SuperStar: A knowledge-based approach for identifying interaction sites in proteins.”, M. L. Verdonk, J. C. Cole and R. Taylor, J. Mol. Biol., 289, 1093-1108, 1999, DOI: 10.1006/jmbi.1999.2809.

“Evaluation of molecular crystal structures using Full Interaction Maps”, P. A. Wood, T. S. G. Olsson, J. C. Cole, S. J. Cottrell, N. Feeder, P. T. A. Galek, C. R. Groom, E. Pidcock, CrystEngComm, 15, 65?72, 2013, DOI: 10.1039/C2CE25849H.

See also

API documentation for the interaction module

Use Cases

Analyse Crystal and Small Molecule

One common use for ccdc.interaction.InteractionMapAnalysis is to analyse a crystal to determine the preferred binding sites of particular functional groups. We will exemplify this by considering paracetamol. Firstly, let us import the relevant modules and classes:

>>> from ccdc import io
>>> from ccdc.interaction import InteractionMapAnalysis

Then we will load a crystal structure for paracetamol:

>>> csd = io.EntryReader('csd')
>>> paracetamol = csd.crystal('HXACAN')

Next we need to instantiate a ccdc.interaction.InteractionMapAnalysis.SmallMoleculeSettings which will control the analysis:

>>> settings = InteractionMapAnalysis.SmallMoleculeSettings()

Amongst the attributes of the settings instance is the set of probe names that may be used in an analysis:

>>> print('\n'.join(settings.probe_names))
Uncharged NH Nitrogen
Carbonyl Oxygen
Aromatic CH Carbon

These default probes will normally be enough for an analysis, but other available probes may be found from the settings class and appended to or replacing the ccdc.interaction.InteractionMapAnalysis.SmallMoleculeSettings.probe_names:

>>> print('\n'.join(settings.csd_defined_probes)) 
Uncharged NH Nitrogen
Charged NH Nitrogen
RNH3 Nitrogen
Alcohol Oxygen
Carbonyl Oxygen
Water Oxygen
Oxygen Atom
Methyl Carbon
Aromatic CH Carbon
C-F Fluorine
C-Cl Chlorine
C-Br Bromine
C-I Iodine
Sp2 CNC Nitrogen
Carboxylate Oxygen
Nitro Oxygen
Chloride Anion

Optionally we may wish to specify a working directory for the analysis into which results will be placed. By default this will be the identifier of the analysed system with the current working directory.

There is a number of other options which may be set, but the default settings will usually suffice.

Then we are ready to calculate the intaraction maps:

>>> analyser = InteractionMapAnalysis(settings=settings)
>>> results = analyser.analyse_small_molecule(paracetamol.molecule)

The results object is an instance of ccdc.interaction.InteractionMapAnalysis.Results. This is a derived class of ccdc.utilities.GridEnsemble giving access to a collection of homogeneous ccdc.utilities.Grid instances.

The calculated grid of the analysis may be accessed by dictionary-like methods, keyed by the probe names of the analysis:

>>> for k in sorted(results.keys()):
...     grid = results[k]
...     print('%s: %.2f, %.2f' % (k, grid.extrema[0], grid.extrema[1]))
Aromatic_CH_Carbon: 0.00, 4.03
Carbonyl_Oxygen: 0.00, 38.71
Uncharged_NH_Nitrogen: 0.00, 10.44

If hotspot detection has been turned on, and it will be by default, one can get them for any of the probes. These are returned as a tuple of ccdc.interaction.InteractionMapAnalysis.Results.Hotspot instances. These have a location, ccdc.molecule.Coordinates and a value, with higher values indicating a stronger hotspot. They are returned in descending value order.

>>> hs = results.hotspots('Carbonyl_Oxygen')
>>> print(len(hs))
11
>>> print(hs[0])
Hotspot(Coordinates(x=4.408, y=-0.035, z=2.151), 38.71)

A similar set of results may be obtained by directly inspecting the grids. The ccdc.utilities.Grid.islands() method will extract subgrids from a grid containing connected values above a given threshold. See Grids for details of the Grid API.

>>> co_grid = results['Carbonyl_Oxygen']
>>> islands = co_grid.islands(2.0)
>>> print(len(islands))
19

We can check that the hottest hotspot does indeed correspond with the island containing the highest value:

>>> l = list(sorted(islands, key=lambda x: x.extrema[1], reverse=True))
>>> origin, far_corner = l[0].bounding_box
>>> assert all(origin[i] <= hs[0][0][i] <= far_corner[i] for i in range(3))

We can configure the generation of hotspots through appropriate changes to the settings class. These are the default settings, but you may change any of them. For examples, it is possible to refine initial values of hotspots by applying a Simplex minimisation procedure. If one wants to filter out small-value peaks this can be done by changing hotsposts_min_value. It is advisable to set it to a value higher than 1.0. Default values will produce a list of hotspots that reside in locations of maximum propensity. However, in order to find secondary points additional iterations can be done by setting hotsposts_ncycles to the number of iterations needed.

>>> settings.detect_hotspots
True
>>> settings.hotspots_refine
False
>>> settings.hotspots_min_value
1.01
>>> settings.hotspots_ncycles
1

We can check that the analysis worked correctly and extract more details of its operation by inspecting the error and log files:

>>> print(results.error_file('Uncharged NH Nitrogen')) 
>>> print(results.log_file('Aromatic CH Carbon'))  
________________________________________________________________________________

...
________________________________________________________________________________



--------------------------------------------------------------------------------
SuperStar settings
--------------------------------------------------------------------------------

Jobname          : Aromatic_CH_Carbon
...

Another use one could make of the small molecule analysis is to analyse other molecules derived from a crystal, for example a dimer, a dehydrated molecule, a packing shell or, as we shall exemplify here, the surface molecules of a slicing plane. We will take one of the surfaces of the BFDH morphology of HXACAN and analyse the interaction maps of it.

We can obtain the molecules subtended by a slicing plane by using the method ccdc.crystal.Crystal.slicing():

>>> slice_mol = paracetamol.slicing(paracetamol.miller_indices(0, 2, 0).plane)
>>> slice_results = analyser.analyse_small_molecule(slice_mol)

The analyser will produce maps all around the molecular assemble but here we are only interested in the surface maps so we can restrict the generated grids to consider only the surface points.

>>> for k in slice_results.keys():
...     g = slice_results[k]
...     near, far = g.slice(paracetamol.miller_indices(0, 2, 0).plane)

Analyse Protein Binding Site

We will need to read a protein from a file, so must import the appropriate module, instantiate the protein, and extract the binding site of its ligand.

>>> from ccdc.protein import Protein
>>> file_name = '4igs.pdb'
>>> file_name = locate_file(file_name)
>>> aldose_reductase = Protein.from_file(file_name)
>>> aldose_reductase.add_hydrogens()
>>> ligand = aldose_reductase.ligands[0]
>>> binding_site = Protein.BindingSiteFromMolecule(aldose_reductase, ligand, 6.0)

By default analyses will detect cavities over the whole protein, but we can analyse a protein’s binding site by using a ccdc.interaction.InteractionMapAnalysis.ProteinSettings, and populating it with appropriate settings. Firstly we instantiate the ProteinSettings class:

>>> protein_settings = InteractionMapAnalysis.ProteinSettings()
>>> protein_settings.working_directory = output_dir
>>> protein_settings.detect_cavities = False
>>> protein_settings.detect_cavity_from_residues = binding_site
>>> analyser = InteractionMapAnalysis(settings=protein_settings)

Now we can perform the analysis:

>>> results = analyser.analyse_protein(aldose_reductase)
>>> protein_hotspots = results.hotspots('Carbonyl_Oxygen')
>>> print(protein_hotspots[0])
Hotspot(Coordinates(x=15.409, y=-6.301, z=16.059), 25.54)

By default, the protein will be prepared by removing all ligands and water molecules, and adding hydrogens. Should you wish to prepare the protein yourself, for example to retain key water molecules or to remove cofactors, you may use the ccdc.protein module to prepare the ligand and call the analyser with:

>>> res = analyser.analyse_protein(aldose_reductase, prepared=True)

There are options to control the cavity which the ccdc.interaction.InteractionMapAnalysis will use. As mentioned above, if no setting has been made the cavity will be detected over the whole protein. Any of the ccdc.protein.Protein.BindingSite instances may be used to set the cavity residues, or a point and radius may be used:

>>> protein_settings.cavity_origin = [1, 2, 3]
>>> protein_settings.cavity_radius = 6.0

Normally cavities will be selected if they have a volume greater than 10.0 Angstrom. This may be specified if one wants to look for particularly small or particularly large cavities:

>>> protein_settings.cavity_volume = 6.0

A cavity will be detected if it passes a certain buriedness criterion. Again this can be specified in the ccdc.interaction.InteractionMapAnalysis.ProteinSettings class if you want to consider less buried cavities or constrain the search only to more buried cavities. Possible values are: ‘shallow’, ‘shallow/normal’, ‘normal’, ‘normal/buried’ and ‘buried’.

>>> print(protein_settings.min_cavity_buriedness)
normal
>>> protein_settings.min_cavity_buriedness = 'normal/buried'

By default the analysis will use data derived from the CSD. We can choose to use data from the PDB instead, where data has been gathered for a smaller set of probes:

>>> protein_settings.source = 'PDB'
>>> print('\n'.join(protein_settings.pdb_defined_probes))
Alcohol Oxygen
Water Oxygen
Carbonyl Oxygen
Amino Nitrogen
Aliphatic CH Carbon
Aromatic CH Carbon
>>> protein_settings.probe_names = ['Aliphatic CH Carbon']

We can choose to analyse one of the ligands of the protein instead of the protein itself, but with the ProteinSettings. We do this by using ccdc.interaction.InteractionMapAnalysis.analyse_ligand() method:

>>> protein_settings = InteractionMapAnalysis.ProteinSettings()
>>> protein_settings.working_directory = output_dir
>>> protein_settings.source = 'CSD_SMALL'
>>> protein_settings.detect_cavities = False
>>> analyser.settings = protein_settings
>>> ligand_results = analyser.analyse_ligand(aldose_reductase, aldose_reductase.ligands[0])

Other Settings

If all you wish to do with a protein is to extract the cavity as a grid, you may use ccdc.interaction.InteractionMapAnalysis.cavities(). This will use the LIGSITE algorithm to detect the cavities, and the resulting grid will contain integer values between 0 and 7, where the higher the number the more buried the cavity point. Here we will use the analyser constructed above, so the cavity we will find will be that subtended by the binding site of the ligand. This will need to be reset in the protein_settings as the analysis of the ligand above may have invalidated the selection of the binding site. The results may be keyed by ‘4IGS_cavity’:

>>> results = analyser.cavities(aldose_reductase)
>>> cavity_grid = results['4IGS_cavity.ligsite']
>>> print(cavity_grid.extrema)
(0.0, 7.0)

One use for this grid might be to weight the values of calculated hotspots according to the buriedness of the hotspot:

>>> import operator
>>> def weighted_hotspot(hot, grid):
...     buriedness = grid.value_at_point(hot.coordinates)
...     return InteractionMapAnalysis.Results.Hotspot(hot.coordinates, hot.value*buriedness)
>>> weighted_hotspots = sorted((weighted_hotspot(hot, cavity_grid) for hot in protein_hotspots), key=operator.attrgetter('value'), reverse=True)
>>> print(weighted_hotspots[0])
Hotspot(Coordinates(x=16.118, y=-7.727, z=14.631), 25.14)

When performing a number of analyses, it may be convenient to arrange that all generated grids are compatible. This can help with similarity analysis amongst other things. This can be ensured by setting the desired grid geometry before analysis. So, for example, if one wanted to analyse two different polymorphs of Axitinib, a kidney cancer drug, one could impose a common grid, by extracting the bounding box of the two molecules and adding 5 Angstrom in each direction:

>>> vusdix = csd.molecule('VUSDIX')
>>> vusdix04 = csd.molecule('VUSDIX04')
>>> minx = min(a.coordinates[0] for a in vusdix.atoms + vusdix04.atoms)
>>> miny = min(a.coordinates[1] for a in vusdix.atoms + vusdix04.atoms)
>>> minz = min(a.coordinates[2] for a in vusdix.atoms + vusdix04.atoms)
>>> maxx = max(a.coordinates[0] for a in vusdix.atoms + vusdix04.atoms)
>>> maxy = max(a.coordinates[1] for a in vusdix.atoms + vusdix04.atoms)
>>> maxz = max(a.coordinates[2] for a in vusdix.atoms + vusdix04.atoms)
>>> bounding_box = (minx-5, miny-5, minz-5), (maxx+5, maxy+5, maxz+5)
>>> settings = InteractionMapAnalysis.SmallMoleculeSettings()
>>> settings.custom_grid_bounding_box = bounding_box

This is applicable to protein analyses as well.

For protein analyses there is an option to allow the rotation of R-[O,N,S]-H torsions. This settings applies to serine, threonine, tyrosine, lysine, and cysteine residues.

>>> protein_settings.rotate_torsions = True

There are parameters in ccdc.interaction.InteractionMapAnalysis.ProteinSettings to control the LogP attenuation and the shell correction:

>>> protein_settings.logp_hydrophobic_attenuation_a
0.4
>>> protein_settings.logp_hydrophobic_attenuation_b
0.2
>>> protein_settings.polar_shell_correction
0.5
>>> protein_settings.apolar_shell_correction
0.0

If at any point you want to see what settings are in operation you can call:

>>> print(protein_settings.summary()) 
working_directory: ...
detect_hotspots: True
hotspots_ncycles: 1
hotspots_min_value: 1.01
hotspots_refine: 0.00
grid_spacing: 0.70
probe_names: Uncharged NH Nitrogen; Carbonyl Oxygen; Aromatic CH Carbon
detect_cavities: False
calculate_contour_surface: False
calculate_connolly_surface: False
source: CSD_SMALL
rotate_torsions: True
logp_hydrophobic_attenuation_a: 0.40
logp_hydrophobic_attenuation_b: 0.20
polar_shell_correction: 0.50
apolar_shell_correction: 0.00

Analyse Surface

One common use for ccdc.interaction.InteractionMapAnalysis is to determine the preferred interaction of particular functional groups with a given surface. We will examine this by considering the [002] surface of ibuprofen. Firstly, let us import the relevant modules and classes:

>>> from ccdc.io import CrystalReader
>>> from ccdc.particle import Surface
>>> from ccdc.interaction import InteractionMapAnalysis

Then we will load a crystal structure and calculate the surface for ibuprofen (CSD Refcode IBPRAC):

>>> ibprac = CrystalReader('CSD').crystal('IBPRAC')
>>> surface_002 = Surface(ibprac, miller_indices=(0,0,2))

Next we need to instantiate a ccdc.interaction.InteractionMapAnalysis.SurfaceSettings which will control the analysis:

>>> settings = InteractionMapAnalysis.SurfaceSettings()

Amongst the attributes of the settings instance is the set of probe names that may be used in an analysis:

>>> print('\n'.join(settings.probe_names))
Uncharged NH Nitrogen
Carbonyl Oxygen
Aromatic CH Carbon

These default probes will normally be enough for an analysis, but other available probes may be found from the settings class ccdc.interaction.InteractionMapAnalysis.SurfaceSettings.probe_names:

>>> print('\n'.join(settings.csd_defined_probes)) 
Uncharged NH Nitrogen
Charged NH Nitrogen
RNH3 Nitrogen
Alcohol Oxygen
Carbonyl Oxygen
Water Oxygen
Oxygen Atom
Methyl Carbon
Aromatic CH Carbon
C-F Fluorine
C-Cl Chlorine
C-Br Bromine
C-I Iodine
Sp2 CNC Nitrogen
Carboxylate Oxygen
Nitro Oxygen
Chloride Anion

Optionally we may wish to specify a working directory for the analysis into which results will be written to. By default this will be the identifier of the analysed system within current working directory.

There are a number of other options which may be set, but the default settings will usually suffice.

Then we are ready to calculate the interaction maps:

>>> analyser = InteractionMapAnalysis(settings=settings)
>>> results = analyser.analyse_surface(surface_002)

The results object is an instance of ccdc.interaction.InteractionMapAnalysis.Results. This is a derived class of ccdc.utilities.GridEnsemble giving access to a collection of homogeneous ccdc.utilities.Grid instances.

The calculated grid of the analysis may be accessed by dictionary-like methods, keyed by the probe names of the analysis:

>>> for probe in sorted(results.keys()):
...    grid = results[probe]
...    print(f"{probe}: {grid.extrema[0]}, {grid.extrema[1]}")
Aromatic_CH_Carbon: 0.0, 21.639
Carbonyl_Oxygen: 0.0, 308.729
Uncharged_NH_Nitrogen: 0.0, 38.782

If hotspot detection has been turned on, and it will be by default, one can obtain hotspots for any of the probes. These are returned as a tuple of ccdc.interaction.InteractionMapAnalysis.Results.Hotspot instances. These have a location, ccdc.molecule.Coordinates and a value, with higher values indicating a stronger hotspot. They are returned in descending value order.

>>> hs = results.hotspots('Carbonyl_Oxygen')
>>> print(len(hs))
14
>>> print(hs[0])
Hotspot(Coordinates(x=6.563, y=2.420, z=4.698), 308.73)

A similar set of results may be obtained by directly inspecting the grids. The ccdc.utilities.Grid.islands() method will extract subgrids from a grid containing connected values above a given threshold. See Grids for details of the Grid API.

>>> co_grid = results['Carbonyl_Oxygen']
>>> islands = co_grid.islands(2.0)
>>> print(len(islands))
34

We can check that the hottest hotspot does indeed correspond with the island containing the highest value:

>>> l = list(sorted(islands, key=lambda x: x.extrema[1], reverse=True))
>>> origin, far_corner = l[0].bounding_box
>>> assert all(origin[i] <= hs[0][0][i] <= far_corner[i] for i in range(3))

We can configure the generation of hotspots through appropriate changes to the settings class. These are the default settings, but you may change any of them. For example, it is possible to refine initial values of hotspots by applying a Simplex minimisation procedure. If one wants to filter out small-value peaks this can be done by changing hotsposts_min_value. It is advisable to set it to a value higher than 1.0. Default values will produce a list of hotspots that reside in locations of maximum propensity. However, in order to find secondary points additional iterations can be done by setting hotspots_ncycles to the number of iterations needed.

>>> settings.detect_hotspots
True
>>> settings.hotspots_refine
False
>>> settings.hotspots_min_value
1.01
>>> settings.hotspots_ncycles
1