Protein API¶
Introduction¶
The main class of the ccdc.protein
module is ccdc.protein.Protein
.
A ccdc.protein.Protein
contains attributes and functions that relate
to protein structures.
API¶
- class ccdc.protein.Protein(identifier, _molecule=None, _protein_structure=None)[source]¶
- class BindingSite(protein, whole_residues=True)[source]¶
A binding site in the protein.
- property atoms¶
The atoms of the cavity.
- property cofactors¶
The cofactors of the cavity.
- property formula¶
Return the chemical formula of the molecules in the binding site.
- property ligands¶
The ligands of the cavity.
- property metals¶
The metals in the cavity.
- property nucleotides¶
The nucleotides of the cavity.
- property residues¶
The residues of the cavity.
- property waters¶
The waters of the cavity.
- class BindingSiteFromAtom(protein, atom, distance)[source]¶
A binding site defined from a protein atom.
- class BindingSiteFromListOfAtoms(protein, atoms)[source]¶
A binding site defined from a list of protein atoms.
- class BindingSiteFromListOfResidues(protein, list_of_residues)[source]¶
A binding site from a list of residues.
- class BindingSiteFromMolecule(protein, molecule, distance, whole_residues=True)[source]¶
A binding site defined from an arbitrary molecule.
- class BindingSiteFromPoint(protein, origin=(0, 0, 0), distance=12.0)[source]¶
A cavity defined from a point.
- class BindingSiteFromResidue(protein, residue, distance)[source]¶
A binding site defined from protein residue.
- class Chain(index, _protein_structure=None)[source]¶
A chain of a protein.
- property residues¶
The residues of a chain.
- property sequence¶
The sequence of amino acid one letter codes in this chain.
- class ChainSuperposition(settings=None)[source]¶
Class for superposition of protein chains using sequence alignment
- class Settings[source]¶
Configuration options for the superposition of protein chains.
- overlay_convergence_tolerance¶
tolerance for convergence in overlay
- overlay_minimum_cycles¶
minimum number of cycles in overlay
- overlay_weighting_factor¶
weighting factor to use in overlay
- sequence_alignment_tool¶
external sequence alignment program
- sequence_search_tool¶
external sequence search program
- superposition_atoms¶
protein chain atoms to use in overlay (RIGID, BACKBONE or CALPHA)
- superpose(chain1, chain2, binding_site1=None)[source]¶
Superpose two protein chains or binding sites
An implementation of the Smith-Waterman algorithm is used unless an external sequence alignment tool is specified in the settings.
If a binding site is supplied for the first chain, only the atoms in the binding site will be overlaid.
- Parameters
chain1 – a
ccdc.protein.Chain
instancechain2 – a
ccdc.protein.Chain
instancebinding_site1 – a
ccdc.protein.BindingSite
instance for the first chain
- Returns
the root-mean square deviation of the overlay and the transformation matrix
- class NucleicAcid(index, _protein_structure=None)[source]¶
A nucleic acid of a protein.
- property nucleotides¶
The nucleotides of the nucleic acid
- property sequence¶
The sequence of nucleotide one letter codes in this nucleic acid.
- class Nucleotide(index, _nucleotide)[source]¶
A single nucleotide of a nucleic acid.
- property atoms¶
The atoms of the nucleotide.
- property code¶
The PDB nucleotide code.
- property identifier¶
The identifier of this nucleotide.
- property nucleic_acid_identifier¶
The identifier of the nucleic acid of which this nucleotide is a part.
- property one_letter_code¶
The nucleotide one letter code.
- class Residue(i, _residue)[source]¶
A single amino acid residue of a protein.
- property atoms¶
The atoms of the residue.
- property backbone_atoms¶
The backbone atoms of the amino acid.
- property c_alpha¶
The C alpha atom of the residue.
- property c_beta¶
The C beta atom, or
None
if there is no C beta atom.
- property c_terminus¶
The C terminus atom.
- property carbonyl_oxygen¶
The carbonyl oxygen atom.
- property chain_identifier¶
The identifier of the chain of which this residue is a part.
- property cysteine_sulphur¶
The sulphur of a cysteine residue, or
None
if not a cysteine.
- property identifier¶
The identifier of this residue.
- property is_acidic¶
Whether the residue is acidic.
- property is_basic¶
Whether the residue is basic.
- property is_hydrophilic¶
Whether the residue is hydrophilic.
- property is_hydrophobic¶
Whether the residue is hydrophobic.
- property n_terminus¶
The N terminus atom.
- property one_letter_code¶
The one letter code of the amino acid.
- property sidechain_atoms¶
The sidechain atoms of this amino acid.
- property three_letter_code¶
The three letter code of the amino acid.
- add_hydrogens(mode='All', rules_file=None)[source]¶
Add hydrogens to the protein structure
This method protonates the protein structure by performing the following operations:
Remove metal bonds
Assign ligand and cofactor bond types and standardise aromatic and delocalised bonds to CSD conventions
Set atom charges to zero
Set bond types for ARG, GLU, ASP appropriately
Apply protonation rules to ligands and cofactors
Add hydrogens to protein, ligands, cofactors, nucleic acids and waters where necessary
Set any remaining unknown bond type to single
- Parameters
mode – ‘all’ to generate all hydrogens (remove any existing hydrogens first) or ‘missing’ to generate hydrogens deemed to be missing.
rules_file – File of rules that express special cases - if None, a default version will be used
- Raises
FileNotFoundError – if the rules_file passed in doesnt exist
ValueError – if mode is not either ‘all’ or ‘missing’
- property cavity_atoms¶
The atoms making up the binding site, if this was read from a gold protein.
- property cavity_residues¶
The residues making up the cavity.
- property chains¶
A tuple of
ccdc.protein.Protein.Chain
.
- property cofactors¶
The tuple of cofactors in the protein.
The identifier of the molecule is of the form chain_id:residue_name.
Note that hydrogen atoms are added automatically to the returned molecules however these are not added to the parent protein.
- detect_ligand_bonds(covalent_links='include')[source]¶
Removes all bonds between ligand or cofactor atoms, and redetects them based on distance between atoms.
This can be useful if the bonds specified by the CONECT records in the PDB are unspecified or undesirable.
- Parameters
mode – covalent_links ‘include’ to include covalent links between the protein and the ligand (the default) and ‘exclude’ to remove them
- static from_entry(entry)[source]¶
Constructs a protein from a given
ccdc.entry.Entry
.- Parameters
entry – Entry from which to construct the protein.
- static known_cofactor_codes()[source]¶
Provide access to a list of known cofactors codes in the underlying library.
- property ligands¶
The tuple of ligands in the protein.
The identifier of the molecule is of the form chain_id:residue_name.
Note that hydrogen atoms are added automatically to the returned molecules however these are not added to the parent protein.
- property metals¶
The metal atoms of the protein.
- normalise_labels(mode='pdb')[source]¶
Normalise labels of atoms in the protein structure
- Parameters
mode – ‘pdb’ (the default) will try to normalise the labels to PDB compliance if possible (i.e. no longer than 4 characters.) If labels are already compliant they will not be changed ‘force’ will call the normalisation regardless of whether they are already compliant ‘molecule’ will normalise using
ccdc.molecule.Molecule.normalise_labels
- property nucleic_acids¶
A tuple of
ccdc.protein.Protein.NucleicAcid
.
- property nucleotides¶
The nucleotides of the protein.
- remove_cofactor(cofactor_id)[source]¶
Remove the specified cofactor.
- Parameters
cofactor_id – str, of the form chain_id:cofactor_id.
- remove_ligand(ligand_id)[source]¶
Remove the specified ligand.
- Parameters
ligand_id – str, of the form chain_id:ligand_id.
- remove_metal_bonds(bonds=None)[source]¶
Removes metal bonds.
- Parameters
bonds – iterable of
ccdc.molecule.Bond
instances. IfNone
all metal bonds will be removed.
- remove_water(water_mols)[source]¶
Remove the water (or waters). If water_mols is a list (or tuple) of water objects remove all waters in said list or tuple
- property residues¶
The amino acid residues of the protein.
- property sequence¶
The one-letter code sequence.
- sort_atoms_by_residue()[source]¶
Sorts atoms by residue
After editing, sometimes the underlying atom list in a protein is not sorted by residue so atoms in a single residue are not in a single block of atoms. In particular, adding hydrogens will add new hydrogen atoms to the end of the atom list.
Calling this method will re-order the atoms in the protein so that each residue is in a single atom block in the atom list. This is useful in particular if you are writing PDB files where having residues as single blocks of ATOM lines is desirable.
Note that calling this method will mean that any pre-existing indexes into the atom list will probably be invalidated.
- property waters¶
The waters of the protein.
- Returns
a tuple of
ccdc.molecule.Molecule
, representing the oxygens of the water.