Protein API

Introduction

The main class of the ccdc.protein module is ccdc.protein.Protein.

A ccdc.protein.Protein contains attributes and functions that relate to protein structures.

API

class ccdc.protein.Protein(identifier, _molecule=None, _protein_structure=None)[source]
class BindingSite(protein, whole_residues=True)[source]

A binding site in the protein.

atoms

The atoms of the cavity.

cofactors

The cofactors of the cavity.

formula

Return the chemical formula of the molecules in the binding site.

ligands

The ligands of the cavity.

metals

The metals in the cavity.

nucleotides

The nucleotides of the cavity.

residues

The residues of the cavity.

waters

The waters of the cavity.

class BindingSiteFromAtom(protein, atom, distance)[source]

A binding site defined from a protein atom.

class BindingSiteFromListOfAtoms(protein, atoms)[source]

A binding site defined from a list of protein atoms.

class BindingSiteFromListOfResidues(protein, list_of_residues)[source]

A binding site from a list of residues.

class BindingSiteFromMolecule(protein, molecule, distance, whole_residues=True)[source]

A binding site defined from an arbitrary molecule.

class BindingSiteFromPoint(protein, origin=(0, 0, 0), distance=12.0)[source]

A cavity defined from a point.

class BindingSiteFromResidue(protein, residue, distance)[source]

A binding site defined from protein residue.

class Chain(index, _protein_structure=None)[source]

A chain of a protein.

residues

The residues of a chain.

sequence

The sequence of amino acid one letter codes in this chain.

class ChainSuperposition(settings=None)[source]

Class for superposition of protein chains using sequence alignment

class Settings[source]

Configuration options for the superposition of protein chains.

overlay_convergence_tolerance = None

tolerance for convergence in overlay

overlay_minimum_cycles = None

minimum number of cycles in overlay

overlay_weighting_factor = None

weighting factor to use in overlay

sequence_alignment_tool = None

external sequence alignment program

sequence_search_tool = None

external sequence search program

superposition_atoms = None

protein chain atoms to use in overlay (RIGID, BACKBONE or CALPHA)

superpose(chain1, chain2, binding_site1=None)[source]

Superpose two protein chains or binding sites

An implementation of the Smith-Waterman algorithm is used unless an external sequence alignment tool is specified in the settings.

If a binding site is supplied for the first chain, only the atoms in the binding site will be overlaid.

Parameters:
  • chain1 – a ccdc.protein.Chain instance
  • chain2 – a ccdc.protein.Chain instance
  • binding_site1 – a ccdc.protein.BindingSite instance for the first chain
Returns:

the root-mean square deviation of the overlay and the transformation matrix

class NucleicAcid(index, _protein_structure=None)[source]

A nucleic acid of a protein.

nucleotides

The nucleotides of the nucleic acid

sequence

The sequence of nucleotide one letter codes in this nucleic acid.

class Nucleotide(index, _nucleotide)[source]

A single nucleotide of a nucleic acid.

atoms

The atoms of the nucleotide.

code

The PDB nucleotide code.

identifier

The identifier of this nucleotide.

nucleic_acid_identifier

The identifier of the nucleic acid of which this nucleotide is a part.

one_letter_code

The nucleotide one letter code.

class Residue(i, _residue)[source]

A single amino acid residue of a protein.

atoms

The atoms of the residue.

backbone_atoms

The backbone atoms of the amino acid.

c_alpha

The C alpha atom of the residue.

c_beta

The C beta atom, or None if there is no C beta atom.

c_terminus

The C terminus atom.

carbonyl_oxygen

The carbonyl oxygen atom.

chain_identifier

The identifier of the chain of which this residue is a part.

cysteine_sulphur

The sulphur of a cysteine residue, or None if not a cysteine.

identifier

The identifier of this residue.

is_acidic

Whether the residue is acidic.

is_basic

Whether the residue is basic.

is_hydrophilic

Whether the residue is hydrophilic.

is_hydrophobic

Whether the residue is hydrophobic.

n_terminus

The N terminus atom.

one_letter_code

The one letter code of the amino acid.

sidechain_atoms

The sidechain atoms of this amino acid.

three_letter_code

The three letter code of the amino acid.

add_cofactor(molecule)[source]

Add a molecule to the protein as a cofactor.

add_hydrogens(mode='All')[source]

Add hydrogens to the protein structure

This method protonates the protein structure by performing the following operations:

  • Remove metal bonds
  • Assign ligand and cofactor bond types and standardise aromatic and delocalised bonds to CSD conventions
  • Set atom charges to zero
  • Set bond types for ARG, GLU, ASP appropriately
  • Apply protonation rules to ligands and cofactors
  • Add hydrogens to protein, ligands, cofactors, nucleic acids and waters where necessary
  • Set any remaining unknown bond type to single
Parameters:mode – ‘all’ to generate all hydrogens (remove any existing hydrogens first) or ‘missing’ to generate hydrogens deemed to be missing.
add_ligand(molecule)[source]

Add a molecule to the protein as a ligand.

cavity_atoms

The atoms making up the binding site, if this was read from a gold protein.

cavity_residues

The residues making up the cavity.

chains

A tuple of ccdc.protein.Protein.Chain.

cofactors

The tuple of cofactors in the protein.

The identifier of the molecule is of the form chain_id:residue_name.

Note that hydrogen atoms are added automatically to the returned molecules however these are not added to the parent protein.

copy()[source]

Copies the protein.

detect_ligand_bonds(covalent_links='include')[source]

Removes all bonds between ligand or cofactor atoms, and redetects them based on distance between atoms.

This can be useful if the bonds specified by the CONECT records in the PDB are unspecified or undesirable.

Parameters:mode – covalent_links ‘include’ to include covalent links between the protein and the ligand (the default) and ‘exclude’ to remove them
static from_entry(entry)[source]

Constructs a protein from a given ccdc.entry.Entry.

Parameters:entry – Entry from which to construct the protein.
static from_file(file_name)[source]

Reads a protein from a file, and constructs the protein.

static known_cofactor_codes()[source]

Provide access to a list of known cofactors codes in the underlying library.

ligands

The tuple of ligands in the protein.

The identifier of the molecule is of the form chain_id:residue_name.

Note that hydrogen atoms are added automatically to the returned molecules however these are not added to the parent protein.

metals

The metal atoms of the protein.

nucleic_acids

A tuple of ccdc.protein.Protein.NucleicAcid.

nucleotides

The nucleotides of the protein.

remove_all_metals()[source]

Removes all metals from the protein.

remove_all_waters()[source]

Removes all waters from the protein.

remove_chain(chain_id)[source]

Remove the chain with the given identifier.

remove_cofactor(cofactor_id)[source]

Remove the specified cofactor.

Parameters:cofactor_id – str, of the form chain_id:cofactor_id.
remove_hydrogens()[source]

Remove all hydrogens from the protein

remove_ligand(ligand_id)[source]

Remove the specified ligand.

Parameters:ligand_id – str, of the form chain_id:ligand_id.
remove_metal(atom)[source]

Remove the given metal atom.

remove_metal_bonds(bonds=None)[source]

Removes metal bonds.

Parameters:bonds – iterable of ccdc.molecule.Bond instances. If None all metal bonds will be removed.
remove_nucleic_acid(nucleic_acid_chain_id)[source]

Remove the chain with the given identifier.

remove_nucleotide(nucleotide_id)[source]

Remove the specified nucleotide.

remove_residue(residue_id)[source]

Remove the specified residue.

remove_water(molecule)[source]

Remove the water with the given oxygen atom.

residues

The amino acid residues of the protein.

sequence

The one-letter code sequence.

waters

The waters of the protein.

Returns:a tuple of ccdc.molecule.Molecule, representing the oxygens of the water.