Protein API¶
Introduction¶
The main class of the ccdc.protein
module is ccdc.protein.Protein
.
A ccdc.protein.Protein
contains attributes and functions that relate
to protein structures.
API¶
-
class
ccdc.protein.
Protein
(identifier, _molecule=None, _protein_structure=None)[source]¶ -
class
BindingSite
(protein, whole_residues=True)[source]¶ A binding site in the protein.
-
atoms
¶ The atoms of the cavity.
-
cofactors
¶ The cofactors of the cavity.
-
formula
¶ Return the chemical formula of the molecules in the binding site.
-
ligands
¶ The ligands of the cavity.
-
metals
¶ The metals in the cavity.
-
nucleotides
¶ The nucleotides of the cavity.
-
residues
¶ The residues of the cavity.
-
waters
¶ The waters of the cavity.
-
-
class
BindingSiteFromAtom
(protein, atom, distance)[source]¶ A binding site defined from a protein atom.
-
class
BindingSiteFromListOfAtoms
(protein, atoms)[source]¶ A binding site defined from a list of protein atoms.
-
class
BindingSiteFromListOfResidues
(protein, list_of_residues)[source]¶ A binding site from a list of residues.
-
class
BindingSiteFromMolecule
(protein, molecule, distance, whole_residues=True)[source]¶ A binding site defined from an arbitrary molecule.
-
class
BindingSiteFromPoint
(protein, origin=(0, 0, 0), distance=12.0)[source]¶ A cavity defined from a point.
-
class
BindingSiteFromResidue
(protein, residue, distance)[source]¶ A binding site defined from protein residue.
-
class
Chain
(index, _protein_structure=None)[source]¶ A chain of a protein.
-
residues
¶ The residues of a chain.
-
sequence
¶ The sequence of amino acid one letter codes in this chain.
-
-
class
ChainSuperposition
(settings=None)[source]¶ Class for superposition of protein chains using sequence alignment
-
class
Settings
[source]¶ Configuration options for the superposition of protein chains.
-
overlay_convergence_tolerance
= None¶ tolerance for convergence in overlay
-
overlay_minimum_cycles
= None¶ minimum number of cycles in overlay
-
overlay_weighting_factor
= None¶ weighting factor to use in overlay
-
sequence_alignment_tool
= None¶ external sequence alignment program
-
sequence_search_tool
= None¶ external sequence search program
-
superposition_atoms
= None¶ protein chain atoms to use in overlay (RIGID, BACKBONE or CALPHA)
-
-
superpose
(chain1, chain2, binding_site1=None)[source]¶ Superpose two protein chains or binding sites
An implementation of the Smith-Waterman algorithm is used unless an external sequence alignment tool is specified in the settings.
If a binding site is supplied for the first chain, only the atoms in the binding site will be overlaid.
Parameters: - chain1 – a
ccdc.protein.Chain
instance - chain2 – a
ccdc.protein.Chain
instance - binding_site1 – a
ccdc.protein.BindingSite
instance for the first chain
Returns: the root-mean square deviation of the overlay and the transformation matrix
- chain1 – a
-
class
-
class
NucleicAcid
(index, _protein_structure=None)[source]¶ A nucleic acid of a protein.
-
nucleotides
¶ The nucleotides of the nucleic acid
-
sequence
¶ The sequence of nucleotide one letter codes in this nucleic acid.
-
-
class
Nucleotide
(index, _nucleotide)[source]¶ A single nucleotide of a nucleic acid.
-
atoms
¶ The atoms of the nucleotide.
-
code
¶ The PDB nucleotide code.
-
identifier
¶ The identifier of this nucleotide.
-
nucleic_acid_identifier
¶ The identifier of the nucleic acid of which this nucleotide is a part.
-
one_letter_code
¶ The nucleotide one letter code.
-
-
class
Residue
(i, _residue)[source]¶ A single amino acid residue of a protein.
-
atoms
¶ The atoms of the residue.
-
backbone_atoms
¶ The backbone atoms of the amino acid.
-
c_alpha
¶ The C alpha atom of the residue.
-
c_beta
¶ The C beta atom, or
None
if there is no C beta atom.
-
c_terminus
¶ The C terminus atom.
-
carbonyl_oxygen
¶ The carbonyl oxygen atom.
-
chain_identifier
¶ The identifier of the chain of which this residue is a part.
-
cysteine_sulphur
¶ The sulphur of a cysteine residue, or
None
if not a cysteine.
-
identifier
¶ The identifier of this residue.
-
is_acidic
¶ Whether the residue is acidic.
-
is_basic
¶ Whether the residue is basic.
-
is_hydrophilic
¶ Whether the residue is hydrophilic.
-
is_hydrophobic
¶ Whether the residue is hydrophobic.
-
n_terminus
¶ The N terminus atom.
-
one_letter_code
¶ The one letter code of the amino acid.
-
sidechain_atoms
¶ The sidechain atoms of this amino acid.
-
three_letter_code
¶ The three letter code of the amino acid.
-
-
add_hydrogens
(mode='All')[source]¶ Add hydrogens to the protein structure
This method protonates the protein structure by performing the following operations:
- Remove metal bonds
- Assign ligand and cofactor bond types and standardise aromatic and delocalised bonds to CSD conventions
- Set atom charges to zero
- Set bond types for ARG, GLU, ASP appropriately
- Apply protonation rules to ligands and cofactors
- Add hydrogens to protein, ligands, cofactors, nucleic acids and waters where necessary
- Set any remaining unknown bond type to single
Parameters: mode – ‘all’ to generate all hydrogens (remove any existing hydrogens first) or ‘missing’ to generate hydrogens deemed to be missing.
-
cavity_atoms
¶ The atoms making up the binding site, if this was read from a gold protein.
-
cavity_residues
¶ The residues making up the cavity.
-
chains
¶ A tuple of
ccdc.protein.Protein.Chain
.
-
cofactors
¶ The tuple of cofactors in the protein.
The identifier of the molecule is of the form chain_id:residue_name.
Note that hydrogen atoms are added automatically to the returned molecules however these are not added to the parent protein.
-
detect_ligand_bonds
(covalent_links='include')[source]¶ Removes all bonds between ligand or cofactor atoms, and redetects them based on distance between atoms.
This can be useful if the bonds specified by the CONECT records in the PDB are unspecified or undesirable.
Parameters: mode – covalent_links ‘include’ to include covalent links between the protein and the ligand (the default) and ‘exclude’ to remove them
-
static
from_entry
(entry)[source]¶ Constructs a protein from a given
ccdc.entry.Entry
.Parameters: entry – Entry from which to construct the protein.
-
static
known_cofactor_codes
()[source]¶ Provide access to a list of known cofactors codes in the underlying library.
-
ligands
¶ The tuple of ligands in the protein.
The identifier of the molecule is of the form chain_id:residue_name.
Note that hydrogen atoms are added automatically to the returned molecules however these are not added to the parent protein.
-
metals
¶ The metal atoms of the protein.
-
nucleic_acids
¶ A tuple of
ccdc.protein.Protein.NucleicAcid
.
-
nucleotides
¶ The nucleotides of the protein.
-
remove_cofactor
(cofactor_id)[source]¶ Remove the specified cofactor.
Parameters: cofactor_id – str, of the form chain_id:cofactor_id.
-
remove_ligand
(ligand_id)[source]¶ Remove the specified ligand.
Parameters: ligand_id – str, of the form chain_id:ligand_id.
-
remove_metal_bonds
(bonds=None)[source]¶ Removes metal bonds.
Parameters: bonds – iterable of ccdc.molecule.Bond
instances. IfNone
all metal bonds will be removed.
-
residues
¶ The amino acid residues of the protein.
-
sequence
¶ The one-letter code sequence.
-
waters
¶ The waters of the protein.
Returns: a tuple of ccdc.molecule.Molecule
, representing the oxygens of the water.
-
class