Cavity API

Introduction

The ccdc.cavity module contains a class ccdc.cavity.Cavity representing a putative binding site on the protein surface that has been automatically detected by the LIGSITE algorithm.

Reference article introducing the LIGSITE algorithm.

This approach uses a three-dimensional grid for the detection of surface depressions with a grid spacing of 0.5 Ångströms. As a result, the cavity is represented by a set of grid intersection points, also called surface points.

Once a cavity has been detected, the flanking fragments of amino acids are assigned pseudocenters to represent their physicochemical properties. Currently, seven types of pseudocenters are used:

  • DONOR: representing a hydrogen bond donor functionality
  • ACCEPTOR: representing a hydrogen bond acceptor functionality
  • DONOR_ACCEPTOR: representing both hydrogen bond donor and acceptor functionalities
  • AROMATIC: represents the center of an aromatic ring system
  • ALIPHATIC: represents an aliphatic moiety
  • PI: denotes the presence of a double bond
  • METAL: represents the position of a metal ion

Reference article for the introduction of the cavity graph comparison method.

Reference article for the introduction of the fast cavity graph comparison method.

Reference article for the introduction of the cavity histograms comparison method.

The class ccdc.cavity.Cavity has methods to automatically create cavities for PDB files, or read a cavity from an XML representation, to extract useful information such as bound ligands, and to compare two cavities using different comparison methods. From this it is possible to write screens to search for similar cavities across a range of proteins, a defined set of cavities, or the entire cavity database.

API

class ccdc.cavity.Cavity(_cavity, _pdb)[source]

A cavity on the protein surface.

class CavityDistanceHistograms(cavity, reference_points=None)[source]

A cavity description based on histograms of distances between reference points and cavity pseudocenters.

histograms

The tuple of histograms defined for this cavity.

class CavityGraphComparison(_comparison)[source]

The result of a cavity graph comparison.

clique_rmsd

The rms deviation of the matched clique points.

n_cliques

The total number of cliques detected during the comparison.

n_matches

The number of matching pseudocenters detected.

rmsd

The rms deviation for the match of all pseudocenters.

score

The similarity score of the comparison.

transformation_matrix

The transformation matrix to overlay the target onto the query cavity

class FastCavityGraphComparison(_comparison)[source]

The result of a fast cavity graph comparison.

largest_clique_size

The size of the largest clique detected.

product_graph_size

The size of the product graph generated during the comparison.

score

The similarity score of the comparison.

class Feature(_feature=None)[source]

An interaction feature in a cavity.

amino_acid_code

The associated amino acid code.

This will be ‘UNK’ if no protein structure has been associated with the cavity containing the feature.

atom

The atom from which this feature is defined.

This will be None if no protein information is associated with the cavity. Aromatic and aliphatic features have no associated atom, so this property has the value None in those cases.

burial

The burial value assigned to this feature (0 to 7, where 7 means most buried).

chain

The chain from which this feature was defined.

This will be None if no protein information is associated with the cavity.

coordinates

The position of the feature.

Returns:a named tuple of coordinates
distance(other)[source]

The distance between this feature and another.

point_distance(point)[source]

The distance from this feature to an arbitrary point.

protein_vector

Vector denoting the ideal interaction direction of the feature with another one outside the protein.

residue

The residue associated with this feature.

This will be None if no protein information is associated with the cavity.

surface_depths

The depth values assigned to the surface points of this feature (0 to 7, where 7 means most buried).

Returns:a tuple of depth values
surface_points

The surface points associated with this feature.

These approximate the surface shape close to the feature.

Returns:a tuple of named tuples of coordinates
surface_vector

Vector denoting the connection from the feature to the centre of its assigned surface points.

type

The type of the interaction feature.

bounding_box

The origin and far corner of the cavity.

cavity_distance_histograms(reference_points=None)[source]

Create a set of feature distance histograms for this cavity based on the given reference point specification.

Parameters:reference_points – a set of reference point measures
Returns:a ccdc.cavity.Cavity.CavityDistanceHistograms instance
compare(other, comparison_method=1, histogram_reference_points=None, max_product_graph_size=36000)[source]

Compare this cavity to another cavity.

Parameters:
  • other – a ccdc.cavity.Cavity instance
  • comparison_method – a member of ccdc.cavity.Cavity.ComparisonMethod, either Cavity.ComparisonMethod.FAST_CAVITY_GRAPH_COMPARISON, Cavity.ComparisonMethod.CAVITY_GRAPH_COMPARISON or Cavity.ComparisonMethod.CAVITY_HISTOGRAMS_COMPARISON
  • histogram_reference_points – an iterable of strings drawn from ‘centroid’, ‘centroid_closest’, ‘centroid_furthest’, ‘centroid_furthest_furthest’. If empty or None, ‘centroid’ and ‘centroid_closest’ will be used for the generation of distance histograms with Cavity.ComparisonMethod.CAVITY_HISTOGRAMS_COMPARISON
  • max_product_graph_size – the maximum allowed size of the product graph for fast cavity graph comparisons
Returns:

a ccdc.cavity.Cavity.FastCavityGraphComparison instance, a ccdc.cavity.Cavity.CavityGraphComparison instance or a similarity score for cavity histogram comparisons

features

The features of the cavity.

Returns:a tuple of ccdc.cavity.Cavity.Feature instances
features_by_atom_distance(atoms, radius)[source]

The set of all features within a radius of any of the atoms.

features_by_distance(centre, radius)[source]

The set of features of the cavity within radius of the centre.

features_by_residues(residues)[source]

The set of features associated with any of the given residues.

features_by_type(type)[source]

The features of a given type.

static from_pdb_file(pdb_file, maximum_incomplete_residues_per_chain=0)[source]

Create cavities from a PDB file.

Parameters:
  • pdb_file – PDB file for the generation of cavities
  • maximum_incomplete_residues – the maximum number of incomplete residues to allow (0 by default)
Raises:

RuntimeError if the PDB file contains more than 1000 SEQRES lines

Returns:

a tuple of ccdc.cavity.Cavity instances

static from_xml(xml, pdb_file=None)[source]

Reads a cavity from an XML string.

Parameters:
  • xml – an XML representation of the cavity
  • pdb_file – an optional PDB file for the associated protein, from which additional data for the cavity may be computed
static from_xml_file(xml_file, pdb_file=None)[source]

Reads a cavity from an XML file and associated PDB file.

Parameters:
  • xml_file – XML file representing the cavity
  • pdb_file – an optional PDB file for the associated protein, from which additional data for the cavity may be computed
Raises:

RuntimeError if the XML file does not exist

Returns:

a ccdc.cavity.Cavity instance

identifier

The identifier of this cavity.

ligand_identifiers

Tuple of ligand identifiers found in the cavity.

ligands

List of ligands of the cavity.

If there is no protein associated with the cavity this will be None.

subcavity(features)[source]

Make a subcavity based on a set of features from this cavity.

Parameters:features – a set of features for construction of the subcavity
Returns:a ccdc.cavity.Cavity instance
to_pymol_file(file_name=None, show_surface_points=False)[source]

Create a visualisation file of this cavity that can be run in PyMOL.

The cavity will be represented by its physicochemical features.

Parameters:
  • file_name – Python file containing the information for displaying the cavity. This should have a .py extension. If not defined, the file will be named using the cavity identifier
  • show_surface_points – additionally display the points representing the cavity’s surface shape
to_xml()[source]

An XML representation of the cavity.

to_xml_file(file_name)[source]

Writes the XML representing a cavity to a file.

Parameters:file_name – the file to which to write the XML
volume

Volume of the cavity in cubic Angstroms.

write(file_name)[source]

Write the cavity to a rlbcoor file for visualisation in hermes.

class ccdc.cavity.CavityDatabase(file_name=None)[source]

An SQLite database for cavities. A path to a database must be passed in when creating an instance of this class.

Please note that the schema for the database, and, indeed, the final choice of underlying database has not been finalised. Accordingly this should be treated as a prototypical implementation. The API for creation and for searching should remain valid, however.

class Settings[source]

Settings appropriate to cavity searches.

acceptor_range = None

minimum and maximum number of acceptor features

aliphatic_range = None

minimum and maximum number of aliphatic features

aromatic_range = None

minimum and maximum number of aromatic features

donor_acceptor_range = None

minimum and maximum number of donor-acceptor features

donor_range = None

minimum and maximum number of donor features

histogram_reference_points = None

an iterable of strings drawn from ‘centroid’, ‘centroid_closest’, ‘centroid_furthest’, ‘centroid_furthest_furthest’. If empty or None, ‘centroid’ and ‘centroid_closest’ will be used for the generation of distance histograms with Cavity.ComparisonMethod.CAVITY_HISTOGRAMS_COMPARISON

ligand_range = None

minimum and maximum number of ligands

logfile = False

logfile of comparison scores, default False

max_hit_structures = 0

maximum number of structures returned

max_product_graph_size = 36000

maximum size of product graph allowed when using the fast cavity graph comparison method

metal_range = None

minimum and maximum number of metal features

pi_range = None

minimum and maximum number of pi features

start = 0

offset starting position in database

verbose = False

verbose output, default False

volume_range = None

minimum and maximum cavity volume

with_ligands = None

ligand identifiers

cavities()[source]

Iterator over the cavities.

cavity(identifier)[source]

The cavity of the given identifier.

cavity_distance_histogram_sets()[source]

Iterator over the cavity histogram sets.

cavity_distance_histograms(identifier)[source]

The distance histograms corresponding to the given cavity identifier.

drop_all_tables()[source]

Drops all tables in the database.

static drugbank_database_dir()[source]

Return the directory containing the DrugBank database.

get_cavities_by_ligand(ligand_id)[source]

Get all the cavities containing a particular PDB ligand identifier.

get_cavities_by_pdb_code(pdb_code)[source]

Get all the cavities for a protein.

get_cavity_by_name(name)[source]

Get the cavity with the exact identifier.

get_cavity_identifiers_by_ligand(ligand_id)[source]

Get the identifiers of all cavities containing a particular PDB ligand identifier.

get_info_for_cavity(cav_name)[source]

Get the information for a cavity.

Parameters:cav_name – the name of the cavity from the cav_xml table of the database
Returns:a dictionary of values from the info table
get_ligands_by_cavity_identifier(cav_id)[source]

All identifiers of ligands in the cavity.

get_ligands_by_pdb_code(pdb_code)[source]

All ligand identifiers in the protein structure.

get_number_of_cavities()[source]

Get the number of cavities stored in the database.

get_number_of_ligands()[source]

Get the number of cavity ligands stored in the database.

populate_all(directory, id_file=None, verbose=False, maximum_allowed_incomplete_residues=0)[source]

Create all tables from the directory of input files.

Parameters:
  • directory – directory containing PDB or XML cavity files
  • verbose – enable verbose output, default False
search(cavity=None, comparison_method=1, settings=None)[source]

Searches the database and optionally performs cavity comparisons against the results.

The query can include a Cavity for comparison against the database. If a cavity is not specified, a search for cavities matching the constraints is performed.

Parameters:
  • cavity – a ccdc.cavity.Cavity instance
  • comparison_method – a member of ccdc.cavity.Cavity.ComparisonMethod, either Cavity.ComparisonMethod.FAST_CAVITY_GRAPH_COMPARISON, Cavity.ComparisonMethod.CAVITY_GRAPH_COMPARISON or Cavity.ComparisonMethod.CAVITY_HISTOGRAMS_COMPARISON
  • settings – a ccdc.cavity.Cavity.Settings instance
Returns:

a list of tuples of comparison score and cavity identifier, sorted by comparison score, starting with the highest similarity, or alternatively returns a generator in the case where no comparisons are performed