Working with molecules, atoms and bonds

Introduction

The classes ccdc.molecule.Atom, ccdc.molecule.Bond and ccdc.molecule.Molecule are all stored in the ccdc.molecule module.

However, you are unlikely to use these classes to create objects directly. You are much more likely to come across them after having read in a molecule from a file or after having accessed one from a database. Let us therefore import the ccdc.io module to allow us to get access to some molecules.

>>> from ccdc import io
>>> from ccdc.molecule import Molecule
>>> csd = io.MoleculeReader('CSD')

A ccdc.molecule.Molecule can contain several molecules

In the CSD Python API, a molecule is not necessarily a single contiguous unit. It can contain several sub-molecules. In the CSD Python API we refer to these sub-molecules as components.

The reason why this arises is that most molecular file formats allow the inclusion of molecular species which are not covalently bonded to each other to be included in the same entry. Below for example is a SMILES representation of the CSD structure ABEBUF.

O=C1Nc2ccccc2C(=O)Nc2ccccc12.c1ccncc1

This “molecule” contains two components 5H,11H-Dibenzo(b,f)(1,5)diazocine-6,12-dione and pyridine.

Note

It is similarly possible to represent several molecules within single entries of mol2 and sdf files. In mol2 file format these are referred to as substructures.

Let us illustrate this using the CSD structure ABEBUF.

>>> filepath = 'ABEBUF.mol2'

To get access to the molecule in this file we make use of a ccdc.io.MoleculeReader.

>>> mol_reader = io.MoleculeReader(filepath)
>>> mol = mol_reader[0]
>>> print mol.identifier
ABEBUF
>>> print mol.smiles
O=C1Nc2ccccc2C(=O)Nc2ccccc12.c1ccncc1
>>> len(mol.components)
2
>>> for component in mol.components:
...     print component.smiles
...
O=C1Nc2ccccc2C(=O)Nc2ccccc12
c1ccncc1

This has implications for molecular properties, which are calculated using all the components within a molecule. Let us illustrate this using the molecular weight.

>>> print mol.molecular_weight
317.3407
>>> for component in mol.components:
...     print component.molecular_weight
...
238.241
79.0997

Finally, it is worth noting that in many cases when dealing with molecules containing salts and solvents one is not particularly interested in the salts or solvents per se, but rather the larger organic molecule in the structure. This “heaviest” component can be accessed as illustrated below.

>>> heaviest_component = mol.heaviest_component
>>> print heaviest_component.identifier, heaviest_component.molecular_weight
01 238.241

Standardising and editing molecules

There are some functions available for standardising and editing molecules. Let us start by illustrating how one can remove and add hydrogen atoms.

>>> print mol.molecular_weight
317.3407
>>> mol.remove_hydrogens()
>>> print mol.molecular_weight
302.2222
>>> mol.add_hydrogens()
>>> print mol.molecular_weight
317.3407

To illustrate some of the more advanced molecule standardisation functions let us read in a PDB ligand (from the structure 1f0r), which has not got any hydrogen atoms added to it, in sdf file format.

>>> filepath = '1f0r-lig.sdf'

To get access to the molecule in this file we make use of a ccdc.io.MoleculeReader.

>>> mol_reader = io.MoleculeReader(filepath)
>>> mol = mol_reader[0]
>>> print mol.identifier
1f0r-lig
>>> print mol.molecular_weight
434.3864

First of all if I am unsure about the bond typing I will need to assign this first. Note that this bond typing is based on the geometry of the molecule.

>>> mol.assign_bond_types()

By default, the ccdc.molecule.Molecule.assign_bond_types() function guesses the bond types of all bonds. However, if you had a molecule that only had the bond types set for some of the bonds and you had confidence in these then you could guess the bond types of only those bonds with “unknown” types by setting the parameter which to unknown.

If you prefer to have alternating single and double bonds rather than aromatic bonds you can call:

>>> mol.kekulize()

After the bond types have been assigned one can add hydrogen atoms to the molecule.

>>> mol.add_hydrogens()

Let us check if the molecule has had its molecular weight increased

>>> print mol.molecular_weight
453.5365

If you had a molecule with bond types that you had confidence in, but you wanted to standardise the aromatic and delocalised bonds to CSD conventions then you could make use of the functions:

Finally, let us come back to the concept of a molecule having multiple components. Suppose that we wanted to add the heaviest component from the ABEBUF molecule to the 1f0r ligand. This could be achieved with the ccdc.molecule.Molecule.add_molecule() function.

>>> mol.add_molecule(heaviest_component)
>>> for component in mol.components:
...     print component.identifier
...
01
02

Note

Operations which change the underlying molecule of an entry or a crystal, such as adding hydrogens, or assigning bond types will permanently affect the molecule. Any subsequent access of the molecule of an entry or crystal will return the mutated molecule. Reading the entry or crystal afresh from a database will, of course, return the original, unmutated molecule.

Accessing molecular properties

Let us now have a look at the molecular properties available to us using the ABEBUF structure.

>>> filepath = 'ABEBUF.mol2'

To get access to the molecule in this file we make use of a ccdc.io.MoleculeReader.

>>> mol_reader = io.MoleculeReader(filepath)
>>> mol = mol_reader[0]

Lists of the atoms and bonds in a molecule are accessed using the attributes with these names.

>>> mol.atoms  
[Atom(O1), Atom(O2), Atom(N1), ... Atom(H13), Atom(H14), Atom(H15)]
>>> len(mol.atoms)
39
>>> mol.heavy_atoms 
[Atom(O1), Atom(O2), ... Atom(C17), Atom(C18), Atom(C19)]
>>> mol.bonds  
[Bond(Double Atom(O1) Atom(C1)), ... Bond(Single Atom(H15) Atom(C19))]
>>> len(mol.bonds)
41

we can gain access to particular atoms and bonds of the molecule if we know their index in the lists. Alternatively, we can gain access to particular atoms and bonds of the molecule by atom labels, provided the labels used are unique within the molecule:

>>> mol.atom('O2')
Atom(O2)
>>> mol.bond('C1', 'O1')
Bond(Double Atom(O1) Atom(C1))

The atoms and bonds have a Sybyl type:

>>> mol.atom('N1').sybyl_type
u'N.am'
>>> mol.bond('N1', 'C1').sybyl_type
u'am'

An atom has a Van der Waals radius:

>>> mol.atom('O1').vdw_radius
1.52
>>> mol.atom('C1').vdw_radius
1.7

An atom also has an atomic weight:

>>> mol.atom('O1').atomic_weight
15.9994
>>> mol.atom('C1').atomic_weight
12.0107

We can also access the formal charge of the molecule as well as the molecular weight, the molecular volume (if all the atoms of the molecule have 3D coordinates) and the molecular formula.

>>> mol.formal_charge
0
>>> round(mol.molecular_weight, 2)
317.34
>>> round(mol.molecular_volume, 2)
275.28
>>> mol.formula
'C19 H15 N3 O2'

Note that the molecular formula is the sum of all the components in the molecule. To get a component-wise formula one can for example:

>>> '.'.join(c.formula for c in mol.components)
'C14 H10 N2 O2.C5 H5 N1'

There is also a property which tells you whether or not the molecule is organic (as opposed to metal-organic).

>>> mol.is_organic
True

It is also worth mentioning that the molecule identifier and SMILES representation can be accessed through the molecule.

>>> mol.identifier
u'ABEBUF'
>>> mol.smiles
u'O=C1Nc2ccccc2C(=O)Nc2ccccc12.c1ccncc1'

Rings

We can get access to the rings of a molecule. For example, let us read in ‘CEMGUY’ from the CSD which has an interesting set of rings:

>>> cemguy = csd.molecule('CEMGUY')
CEMGUY 2D diagram.

CEMGUY 2D diagram, to exemplify ring analysis.

We can immediately get access to the size of the largest and smallest ring of this molecule:

>>> print cemguy.largest_ring_size, cemguy.smallest_ring_size
6 3

The molecule has a property which will give access to the basic rings of the structure. The set of rings returned will be all simple rings, not an enclosing, larger ring. In this example, there will be a 3-membered ring, a 5-membered ring and two six-membered rings, but not the six-membered ring containing the 3- and 5-membered rings:

>>> rings = cemguy.rings
>>> print len(rings)
4
>>> print([len(r) for r in rings])
[3, 5, 6, 6]

We can get the atoms and bonds of each ring:

>>> print rings[0].atoms 
[Atom(...), Atom(...), Atom(...)]
>>> print rings[0].bonds 
(Bond(Single Atom(...) Atom(...)), Bond(Single Atom(...) Atom(...)), Bond(Single Atom(...) Atom(...)))

Note that the atoms of the ring are ordered such that there is a bond between consecutive atoms. We can test for containment in a ring:

>>> print cemguy.atom('C1') in rings[0]
True
>>> print cemguy.bond('C6', 'C1') in rings[0]
True

There are a couple of predicates on rings:

>>> print rings[0].is_aromatic
False
>>> print rings[-1].is_aromatic
True
>>> print rings[0].is_fused
True
>>> print rings[-1].is_fused
False
>>> print rings[0].is_fully_conjugated
False
>>> print rings[-1].is_fully_conjugated
True

The predicate ‘is_fully_conjugated’ is true for aromatic ring systems and for ring systems where every single bond is conjugated, for example the 5-membered ring in ‘POCTOS’.

POCTOS 2D diagram

POCTOS 2D diagram to exemplify conjugated rings.

>>> poctos = csd.molecule('POCTOS').components[0]
>>> poctos_rings = poctos.rings
>>> print poctos_rings[0].is_aromatic
False
>>> print poctos.rings[0].is_fully_conjugated
True

The atom ‘C6’ in ‘CEMGUY’ is a spiro atom and is in two rings; the atom ‘C5’ is not spiro but is also in two rings. The Atom property, rings, gives access to the list of all rings in which the atom lies, and the Bond property, rings, similarly gives access to the rings of which the bond forms part:

>>> print cemguy.atom('C6').is_spiro
True
>>> print len(cemguy.atom('C6').rings)
2
>>> print cemguy.atom('C5').is_spiro
False
>>> print cemguy.atom('C5').rings 
[Atom(...)-Atom(...)-Atom(...), Atom(...)-Atom(...)-Atom(...)-Atom(...)-Atom(...)]
>>> print cemguy.atom('C12').rings
[]
>>> print len(cemguy.bond('C1', 'C5').rings)
2

Some of the bonds of ‘POCTOS’ are conjugated:

>>> poctos.bond('C6', 'C7').is_conjugated
True
>>> poctos.bond('C1', 'O1').is_conjugated
False

We are not analysing the geometry of the bonded system in order to determine conjugation: this may produce false positives.

Working with string representations of mol2 and sdf files

It is sometimes useful to have access to a string representation of a molecule within a Python script, for example to convert a CCDC molecule into an RDKit molecule by making use of a SDF formatted string as an intermediate.

The ccdc.molecule.Molecule.to_string() method is therefore provided for a molecule to be returned as a string in either ‘mol2’ or ‘sdf’ file format.

>>> s = mol.to_string(format='mol2')
>>> print s 
@<TRIPOS>MOLECULE
ABEBUF
    39    41     2     0     0
SMALL
NO_CHARGES
****
Generated from the CSD

@<TRIPOS>ATOM
     1 O1       1.9287   5.1676   1.7209   O.2       1 RES1   0.0000
...

Similarly the ccdc.molecule.Molecule.from_string() method reads in a molecule from a ‘mol2’, ‘sdf’, ‘mol’, or ‘cif’ formatted string.

>>> m = Molecule.from_string(s, format='mol2')
>>> print m.identifier, m.molecular_weight
ABEBUF 317.3407

The format parameter may be one of ‘mol2’, ‘sdf’, ‘mol’, or ‘cif’.

Note

A SMILES representation of the molecule is accessible through its ccdc.molecule.Molecule.smiles attribute.

Close Contacts

Close contacts, pairs of atoms closer than the sum of there Van der Waals radii, can be identified within a molecule.

>>> abelup = csd.molecule('ABELUP')
>>> print abelup.contacts()
(Contact(Atom(O1)-Atom(O2)), Contact(Atom(O1)-Atom(O7)), Contact(Atom(O3)-Atom(H2)), Contact(Atom(O4)-Atom(H2)), Contact(Atom(O5)-Atom(H3)), Contact(Atom(O6)-Atom(H3)), Contact(Atom(O7)-Atom(H1)), Contact(Atom(C7)-Atom(H1)), Contact(Atom(O11)-Atom(C9)), Contact(Atom(O11)-Atom(H6)), Contact(Atom(O11)-Atom(H17)), Contact(Atom(C17)-Atom(H6)), Contact(Atom(H5)-Atom(H16)))

The ccdc.molecule.Molecule.Contact instance has a couple of useful properties:

>>> contact = abelup.contacts()[0]
>>> print contact.atoms
(Atom(O1), Atom(O2))
>>> print round(contact.length, 3)
2.67
>>> print round(contact.strength, 3)
0.37
>>> print contact.type
Van der Waals

Line Of Sight

A contact has a property, ccdc.molecule.Contact.is_in_line_of_sight which will determine whether the atoms constituting the contact are visible from each other, or whether the contact is occluded by another atom of the molecule.

>>> print contact.is_in_line_of_sight
True

This property is also implemented for a molecule’s hydrogen bonds, where the test is whether the hydrogen, if present, is occluded from the acceptor, or if the hydrogen is not present, whether the donor is occluded from the acceptor.

This is implemented using the ccdc.molecule.Atom.is_in_line_of_sight(). This method takes another atom and optionally an iterable of molecules. This latter argument is useful to check whether an atom of a different molecule occludes the pair of atoms, for example, to see which water molecule of AAGAGG10 occludes the potential hydrogen bond between O3 and N2:

>>> aagagg10 = csd.molecule('AAGAGG10')
>>> components = aagagg10.components
>>> heavy = components[0]
>>> print heavy.atom('O3').is_in_line_of_sight(heavy.atom('H2'))
True
>>> print heavy.atom('O3').is_in_line_of_sight(heavy.atom('H2'), [components[1]])
False
>>> print heavy.atom('O3').is_in_line_of_sight(heavy.atom('H2'), [components[2]])
True

Hydrogen Bonds

The hydrogen bonds of the molecule may be obtained by the method ccdc.molecule.Molecule.hbonds().

>>> hbonds = abelup.hbonds()
>>> print len(hbonds)
1
>>> print hbonds
(HBond(Atom(O7)-Atom(H1)-Atom(O1)),)

There is only one hydrogen bond detected. This is an instance of ccdc.molecule.Molecule.HBond which is a derived class from ccdc.molecule.Molecule.Contact and has some additional properties:

>>> hb = hbonds[0]
>>> print hb.atoms
(Atom(O7), Atom(H1), Atom(O1))
>>> print round(hb.angle, 3), round(hb.length, 3), round(hb.da_distance, 3), round(hb.strength, 3)
146.797 1.612 2.438 1.612

HBond Criterion

There is another possible hydrogen bond in this structure between atoms O1 and O2. There is, though, no hydrogen to be donated, but use of the keywork argument ‘hbond_criterion’ to ccdc.molecule.Molecule.hbonds() will allow this case to be detected:

>>> criterion = Molecule.HBondCriterion()
>>> hbonds = abelup.hbonds(hbond_criterion=criterion)
>>> print hbonds
(HBond(Atom(O1)-Atom(O2)), HBond(Atom(O1)-Atom(O7)))

The default constructed ccdc.molecule.Molecule.HBondCriterion does not require the hydrogens to be present, and in consequence both potential hydrogen bonds are detected. Note that the hydrogen bonds returned do not contain a hydrogen even though it is present in one of the hydrogen bonds. In consequence the ccdc.molecule.Molecule.HBond.length and ccdc.molecule.Molecule.HBond.angle are both None.

Let us look more closely at the ccdc.molecule.Molecule.HBondCriterion class. We can retrieve the default behaviour of the method by setting the attribute ccdc.molecule.Molecule.HBondCriterion.require_hydrogens to True:

>>> criterion.require_hydrogens = True
>>> print abelup.hbonds(hbond_criterion=criterion)
(HBond(Atom(O7)-Atom(H1)-Atom(O1)),)

We can specify the geometry of the hydrogen bonds detected:

>>> abacuf = csd.molecule('ABACUF01')
>>> hbonds = abacuf.hbonds(hbond_criterion=criterion)
>>> print hbonds
(HBond(Atom(O2)-Atom(H7A)-Atom(O8A)), HBond(Atom(O6)-Atom(H5A)-Atom(O7A)), HBond(Atom(O2A)-Atom(H7)-Atom(O8)), HBond(Atom(O6A)-Atom(H5)-Atom(O7)))
>>> print [round(hb.angle, 3) for hb in hbonds]
[147.972, 160.882, 147.972, 160.882]
>>> print [round(hb.da_distance, 3) for hb in hbonds]
[2.788, 2.879, 2.788, 2.879]
>>> criterion.angle_tolerance = 150.0
>>> hbonds = abacuf.hbonds(hbond_criterion=criterion)
>>> print hbonds
(HBond(Atom(O6)-Atom(H5A)-Atom(O7A)), HBond(Atom(O6A)-Atom(H5)-Atom(O7)))
>>> print [round(hb.angle, 3) for hb in hbonds]
[160.882, 160.882]
>>> criterion.angle_tolerance = 120.0
>>> criterion.distance_range = (-5.0, -0.67)
>>> hbonds = abacuf.hbonds(hbond_criterion=criterion)
>>> print hbonds
(HBond(Atom(O2)-Atom(H7A)-Atom(O8A)), HBond(Atom(O2A)-Atom(H7)-Atom(O8)))
>>> print [round(hb.da_distance, 3) for hb in hbonds]
[2.788, 2.788]

If we wish we can work with distances, rather than Van der Waals corrected distances:

>>> criterion = Molecule.HBondCriterion()
>>> criterion.vdw_corrected = False
>>> print criterion.distance_range
(-5.0, 0.0)
>>> criterion.distance_range = (0, 2.8)
>>> hbonds = abacuf.hbonds(hbond_criterion=criterion)
>>> print hbonds
(HBond(Atom(O2)-Atom(O8A)), HBond(Atom(O2A)-Atom(O8)))
>>> print [round(hb.da_distance, 3) for hb in hbonds]
[2.788, 2.788]

The :attr::ccdc.molecule.Molecule.HBondCriterion.path_length_range may be used to control whether hydrogen bonds may span distinct components of a molecule. By default this range is set to (4, 999), i.e. hydrogen bonding atoms must be on the same component and separated by at least four atoms. We can change this to detect hydrogen bonds between the components of a molecule:

>>> catkit = csd.molecule('CATKIT')
>>> print len(catkit.hbonds())
0
>>> criterion.path_length_range = (-1, 999)
>>> print len(catkit.hbonds(hbond_criterion=criterion))
1

Within the ccdc.molecule.Molecule.HBondCriterion we have access to the atom types required for forming hydrogen bonds. These may be changed to include or exclude particular hydrogen bonds. The atom types are exposed as dictionary-like attributes of ccdc.molecule.Molecule.HBondCriterion:

>>> criterion = Molecule.HBondCriterion()
>>> donor_types = criterion.donor_types
>>> print donor_types.keys()
[u'+nitrogen', u'metal bound N', u'imine N', u'aromatic (6-ring) N', u'amide or thioamide N', u'planar N', u'pyramidal N', u'ammonium N (NH4+, RNH3+ etc.)', u'unclassified  N', u'-nitrogen', u'+oxygen', u'metal bound O', u'water O', u'hydroxyl O', u'unclassified  O', u'-oxygen', u'+sulphur', u'metal bound S', u'non metal bound S', u'-sulphur', u'+carbon', u'sp C', u'sp2 C', u'sp3 C', u'aromatic C', u'carboncationic C', u'-carbon']

As with all dictionary-like objects, care must be taken to use the exact string as a key. The values are ‘YES’, ‘NO’ and ‘NEVER’. The latter is purely heuristic, indicating that it is highly unlikely that this should ever be considered a hydrogen-bonding atom. When setting, the values True and False will work as expected:

>>> print donor_types['carboncationic C']
NO
>>> donor_types['carboncationic C'] = True
>>> print donor_types['carboncationic C']
YES

Using this interface is possibly error prone, as it relies on the user typing the correct string name for a donor or an acceptor. There are methods using atoms as exemplars of their type:

>>> donor_atom = abacuf.atoms[5]
>>> print criterion.is_donor(donor_atom)
True
>>> print criterion.donor_atom_type(donor_atom)
metal bound O
>>> criterion.set_donor_type(donor_atom, False)
>>> print criterion.is_donor(donor_atom)
False

Now no hydrogen bonds will be found using this criterion:

>>> print abacuf.hbonds(hbond_criterion=criterion)
()

Accessing atomic properties

Let us have a look at the atomic properties available to us using the ABEBUF structure. To do this we will need an atom. Let us just use the first one in the molecule.

>>> atom = mol.atoms[0]

We can identify the atom using the index and/or the label. Atoms may have residue labels where these have been set in the molecule.

>>> atom.index
0
>>> atom.label
u'O1'
>>> atom.residue_label
u'****'
>>> atom.residue_label = 'RES'
>>> atom.residue_label
u'RES1'

Here the residue sequence number is appended to the specified label. If you wish to set the residue sequence number, assign the pair of values to the atom:

>>> atom.residue_label = ('RES', 4)
>>> atom.residue_label
u'RES4'

There is a convenience method, ccdc.molecule.Molecule.set_residue_by_component() which sets the residue labels of disconnected components of the molecule to distinct values.

To understand the chemistry we can inspect the atomic symbol and the formal charge.

>>> atom.atomic_symbol
u'O'
>>> atom.formal_charge
0

There is a property to inspect the partial charge of an atom, where available, and a method of the molecule to calculate partial charges. The method used is that of Gasteiger J.; Marsili M., Tetrahedron 36 (22) 3219-3228 1980. This is comparatively fast, and has good convergence properties, but is applicable only to organic molecules. Accordingly a bool is returned indicating the success of failure of the assignment.

>>> print atom.partial_charge
None
>>> print mol.assign_partial_charges()
True
>>> print round(atom.partial_charge, 2)
-0.18

If a formal charge is explicitly set for any atom of the molecule, the partial charges will be invalid and so will be removed.

>>> atom.formal_charge = -1
>>> print atom.partial_charge
None

In common with many cheminformatics toolkits, the partial charge properties of atoms may be abused to store an arbitrary value:

>>> atom.partial_charge = 3.14
>>> print round(atom.partial_charge, 2)
3.14

Finally, partial charges may be removed, allowing atomic formal charges to be represented in a mol2 format charge field:

>>> mol.remove_partial_charges()
>>> print atom.partial_charge
None

Furthermore, there are also descriptors to check whether or not an atom is a metal, a hydrogen bond donor, a hydrogen bond acceptor and whether or not it is in a ring system.

>>> atom.is_metal
False
>>> atom.is_donor
False
>>> atom.is_acceptor
True
>>> atom.is_cyclic
False

Clearly, we also have access to the atomic coordinates.

>>> atom.coordinates
Coordinates(x=1.929, y=5.168, z=1.721)
>>> [round(a, 4) for a in atom.fractional_coordinates]
[0.1799, 0.4741, 0.0623]

Note

The coordinates returned by ccdc.molecule.Atom.coordinates() is in orthogonal space. If the atom has no coordinates, None will be returned. Similarly fractional_coordinates will return None if no coordinate information is available.

Where the uncertainty of the determination of the coordinates has been recorded, this information may be obtained from the atom. In the case of ABEBUF read from the molecule file, this information is not available:

>>> print atom.fractional_uncertainties
None

However the CSD entry for ABEBUF does contain the requisite data:

>>> abebuf = io.MoleculeReader('CSD').molecule('ABEBUF')
>>> a = abebuf.atoms[0]
>>> print a.fractional_uncertainties
(0.17989(8), 0.47410(8), 0.06227(3))
>>> uncertain_val = a.fractional_uncertainties[0]
>>> print uncertain_val.value, uncertain_val.precision, uncertain_val.uncertainty
0.17989 5 8

Atoms read from a PDB format file may have displacement parameters, the same is true for structures read from a CIF format, and structures from newer versions of SQLite crystal structure databases. For such atoms there is a property, ccdc.molecule.Atom.displacement_parameters giving access to a class, ccdc.molecule.Atom.DisplacementParameters from which these data may be extracted. For data sources that do not contain contain atomic displacement parameters, the property returns None:

>>> print atom.displacement_parameters
None

However, an atom read from a PDB file will have this information:

>>> file_name = 'pdb1fax.pdb'
>>> from ccdc.protein import Protein
>>> protein = Protein.from_file(file_name)
>>> atom_with_displacement_parameters = protein.atoms[0]
>>> displacement_parameters = atom_with_displacement_parameters.displacement_parameters
>>> print displacement_parameters.type
Isotropic
>>> print displacement_parameters.temperature_factor
37.223

There are other, more specialised data available: please see the documentation of ccdc.molecule.Atom.DisplacementParameters for more details.

There is a convenience property to determine whether all the heavy atoms of a molecule have 3D coordinate information:

>>> mol.is_3d
True

We have access to the neighbouring atoms:

>>> atom.neighbours
(Atom(C1),)

and to the bonds to this atom:

>>> atom.bonds
(Bond(Double Atom(O1) Atom(C1)),)

We can access R/S chirality information for atoms:

>>> duxyea = io.MoleculeReader('CSD').molecule('DUXYEA')
>>> print [(a.label, a.chirality) for a in duxyea.atoms if a.is_chiral] 
[(u'C1', 'S'), (u'C3', 'R'), (u'C6', 'R'), (u'C7', 'S')...]

We will also detect para-chiral centres where the chirality of the atom depends only on the chirality of connected atoms:

>>> hptrol = io.MoleculeReader('csd').molecule('HPTROL')
>>> hptrol.add_hydrogens()
>>> print [(a.label, a.chirality) for a in hptrol.atoms if a.is_chiral] 
[(u'C2', 'R'), (u'C4', 'S'), (u'C6', 'S')]
HPTROL 2D diagram.

HPTROL 2D diagram, to exemplify para-chiral centres.

Here the central C4 atom is para-chiral since the atoms C2 and C6 have opposite chirality.

Accessing bond properties

Let us have a look at the bond properties available to us using the ABEBUF structure. To do this we will need a bond. Let us just use the first one in the molecule.

>>> bond = mol.bonds[0]

To identify the bond we can access the two atoms connected by it.

>>> atom1, atom2 = bond.atoms
>>> print atom1.label, atom2.label
O1 C1

To understand the chemistry we can check the bond type.

>>> print bond.bond_type
Double

Note that we can also use integers to check the type of a bond.

>>> bond.bond_type == 1
False
>>> bond.bond_type == 2
True

The integers used to represent bond types according to the CSD convention are.

Type Integer
Unknown 0
Single 1
Double 2
Triple 3
Quadruple 4
Aromatic 5
Delocalised 7
Pi 9

It is also possible to check whether or not a bond is cyclic and/or rotatable.

>>> bond.is_cyclic
False
>>> bond.is_rotatable
False