Editing molecules

Introduction

The classes ccdc.molecule.Atom, ccdc.molecule.Bond and ccdc.molecule.Molecule all support editing of their attributes. This enables both de novo creation of new molecules and modification of existing molecules. Let us firstly exemplify this by changing ‘AABHTZ’ into something unrecognisable.

We must import the ccdc.io module to get access to the CSD, and read in ‘AABHTZ’:

>>> from ccdc import io
>>> csd = io.MoleculeReader('csd')
>>> aabhtz = csd.molecule('AABHTZ')

Atom editing

Now we can change atom types, for example change the chlorine atoms to another halogen:

>>> for a in aabhtz.atoms:
...     if a.atomic_symbol == 'Cl':
...         a.atomic_symbol = 'Br'

Equivalently we could have changed the atoms’ atomic_number:

>>> for a in aabhtz.atoms:
...     if a.atomic_number == 17:
...         a.atomic_number = 35

These atoms now have inappropriate labels:

>>> print(', '.join(a.label for a in aabhtz.atoms if a.atomic_symbol == 'Br'))
Cl1, Cl2

which we can change:

>>> for i, a in enumerate(aabhtz.atoms):
...     if a.atomic_symbol == 'Br':
...         a.label = 'Br%d' % (i+1)
>>> print(', '.join(a.label for a in aabhtz.atoms if a.atomic_symbol == 'Br'))
Br1, Br2

Other attributes of the atoms can be changed, for example, ccdc.molecule.Atom.formal_charge, ccdc.molecule.Atom.coordinates for those who wish to set the formal charges and coordinates, respectively, of atoms explicitly.

Bond editing

The type of a bond may be changed, using either a string representation, an integer or a ccdc.molecule.Bond.BondType. Let us change the double bonded oxygens to acids:

>>> for b in aabhtz.bonds:
...     if 'O' in [a.atomic_symbol for a in b.atoms]:
...         b.bond_type = 'Single'

It is important to now make sure hydrogens are added appropriately:

>>> aabhtz.add_hydrogens()

Molecule editing

Molecules may be constructed from scratch, or an existing molecule edited.

Building molecules from scratch

It is possible to build up a molecule from scratch. To illustrate how to construct a molecule let us create a charged pyridine fragment.

First of all let us import the required classes from the ccdc.molecule module.

>>> from ccdc.molecule import Molecule, Atom, Bond

Now we need to create a ccdc.molecule.Molecule instance.

>>> mol = Molecule(identifier='my molecule')

Then we create the ccdc.molecule.Atom instances.

>>> a1 = Atom('N', coordinates=(-0.301, -0.968, 4.080), formal_charge=1)
>>> a2 = Atom('C', coordinates=(-1.590, -1.256, 4.148))
>>> a3 = Atom('C', coordinates=(-2.144, -2.420, 3.669))
>>> a4 = Atom('C', coordinates=(-1.327, -3.345, 3.075))
>>> a5 = Atom('C', coordinates=( 0.200, -3.055, 2.977))
>>> a6 = Atom('C', coordinates=( 0.447, -1.874, 3.505))

Once created the ccdc.molecule.Atom instances can be added to the molecule.

>>> a1_id = mol.add_atom(a1)
>>> a2_id = mol.add_atom(a2)
>>> a3_id = mol.add_atom(a3)
>>> a4_id = mol.add_atom(a4)
>>> a5_id = mol.add_atom(a5)
>>> a6_id = mol.add_atom(a6)

The bonds can then be added between the atoms.

>>> aromatic_bond_type = Bond.BondType(5)
>>> b1_id = mol.add_bond(aromatic_bond_type, a1_id, a2_id)
>>> b2_id = mol.add_bond(aromatic_bond_type, a2_id, a3_id)
>>> b3_id = mol.add_bond(aromatic_bond_type, a3_id, a4_id)
>>> b4_id = mol.add_bond(aromatic_bond_type, a5_id, a6_id)
>>> b5_id = mol.add_bond(aromatic_bond_type, a6_id, a1_id)

Finally, let us add hydrogen atoms to the molecule.

>>> mol.add_hydrogens()

Let us view our creation in mol2 file format.

>>> print(mol.to_string('mol2')) # doctest: +NORMALIZE_WHITESPACE, +ELLIPSIS
@<TRIPOS>MOLECULE
my molecule
    14    13     1     0     0
SMALL
USER_CHARGES
****
Generated from the CSD
...

However there are easier ways to construct molecules using geometries from the CSD. We can construct the same molecule by taking ‘ABADUE’ from the CSD, which contains a pyridine, and removing the remaining atoms.

>>> abadue = csd.molecule('ABADUE')

We could remove the other atoms like this:

>>> keep_atom_labels = ['C1', 'C2', 'C3', 'C4', 'C5', 'N1', 'H1', 'H2', 'H3', 'H4']
>>> for a in abadue.atoms:
...     if a.label not in keep_atom_labels:
...         abadue.remove_atom(a)

or more succinctly by a generator expression:

>>> abadue.remove_atoms(
...     a for a in abadue.atoms if a.label not in keep_atom_labels
... )

But in this instance the atoms to remove form a connected group of atoms downstream of the bond between atoms C5 and C6. There is a single API call to achieve this:

>>> abadue = csd.molecule('ABADUE')
>>> c5 = abadue.atom('C5')
>>> c6 = abadue.atom('C6')
>>> abadue.remove_group(c5, c6)

There are analogous methods to remove a bond, or to remove a set of bonds, which may be used to break rings or to disconnect a molecule into separate components.

We can substitute groups, in a similar way to that of the ccdc.molecule.Molecule.remove_group() above. In this case we will preserve the geometry of the substituting group, by translating and rotating it so that the the orientation of the substituted group is preserved.

For example ‘ABFMPB’ has a benzamide group, and we might wish to substitute the benzene by a smaller lipophilic moiety, perhaps to fit better into a binding site. The same structure has a methyl group which can be used. This can be done by identifying the two groups and substituting:

>>> abfmpb = csd.molecule('ABFMPB').components[0]  # The crystal structure is a dimer
>>> c1, c10 = abfmpb.atom('C1'), abfmpb.atom('C10')
>>> c3, c9 = abfmpb.atom('C3'), abfmpb.atom('C9')
>>> abfmpb.change_group(c1, c10, abfmpb, c3, c9)
(Atom(C10), Atom(H8), Atom(H9), Atom(H10))

Note that copies of the affected atoms are taken, which is why it is safe to use the same molecule as the source of a group.

There is also ccdc.molecule.Molecule.add_group(), which will copy a set of downstream atoms without changing geometry.

There is a function ccdc.molecule.Molecule.fuse_rings() which can be used to join two ring systems. Here we can replace the benzene of abfmpb by a naphthalene:

>>> # Firstly remove a couple of spare hydrogens
>>> abfmpb = csd.molecule('ABFMPB').components[0]
>>> abfmpb.remove_atoms(a for a in abfmpb.atoms if a.label in ['H12', 'H13'])
>>> # Then remove everything except a benzene from a copy
>>> copy_abfmpb = abfmpb.copy()
>>> c10 = copy_abfmpb.atom('C10')
>>> c1 = copy_abfmpb.atom('C1')
>>> copy_abfmpb.remove_group(c10, c1)
>>> # Remove a spare hydrogen from the copy
>>> copy_abfmpb.remove_atom(copy_abfmpb.atom('H15'))
>>> # Finally fuse the rings
>>> c13 = abfmpb.atom('C13')
>>> c12 = abfmpb.atom('C12')
>>> c15 = copy_abfmpb.atom('C15')
>>> abfmpb.fuse_rings(c13, c12, copy_abfmpb, c10, c15)
(Atom(C11), Atom(C12), Atom(C13), Atom(C14), Atom(H11), Atom(H14))

This will provide a planar ring fusion.

Changing molecular geometry

The API provides methods to change bond lengths, valence and torsion angles and to move the whole molecule.

For example, let us take CUPDAT from the CSD, modify a bond type and make associated edits, such as removing a hydrogen and setting geometries:

>>> cupdat = csd.molecule('CUPDAT')
>>> b = cupdat.bond('O1', 'C3')
>>> b.bond_type = 'Double'
>>> cupdat.remove_atom(cupdat.atom('H5'))
>>> cupdat.atom('O1').formal_charge = -1
>>> cupdat.set_bond_length(cupdat.atom('O1'), cupdat.atom('C3'), b.ideal_bond_length)
>>> cupdat.set_valence_angle(cupdat.atom('C1'), cupdat.atom('O1'), cupdat.atom('C3'), 120.0)
>>> cupdat.set_torsion_angle(
...     cupdat.atom('O1'), cupdat.atom('C3'), cupdat.atom('H4'), cupdat.atom('H6'), 180.0
... )

Note that the torsion angle set here is an improper torsion, which is handled seamlessly by the API. The ccdc.molecule.Bond.ideal_bond_length provides a length based on the bond type and the elements it connects.

The whole molecule may be moved around either by translation, rotation about its centre of geometry, by matrices and by Quaternions.

For example to centre a molecule on the origin:

>>> aabhtz = csd.molecule('AABHTZ')
>>> cog = aabhtz.centre_of_geometry()
>>> aabhtz.translate([-cog.x, -cog.y, -cog.z])

To rotate a molecule about its centre of geometry:

>>> aabhtz.rotate([0, 1, 0], 180.)

An arbitrary matrix may be applied to the molecule. It is up to the user to avoid inappropriate affine transformations, such as scaling and shearing.

Here we use a numpy array to perform a translation. Numpy is a python module for scientific computing and it provides powerful array processing methods. Numpy may be used to perform matrix multiplications for the construction of arbitrary rotations and translations. If it is not already, numpy will need to be installed in your python environment before it it an be used.

>>> import numpy
>>> M = numpy.eye(4)
>>> M[0][3] = 1.0
>>> M[1][3] = 2.0
>>> M[2][3] = 3.0
>>> aabhtz.transform(M)

A Quaternion may be used instead, which provides a convenient shorthand for a rotation about a particular axis:

>>> Q = [1, 0, 1, 0]
>>> aabhtz.apply_quaternion(Q)

Coordinates may be transferred from one molecule to another, using an iterable of matched pairs of atoms:

>>> aacfaz = csd.molecule('AACFAZ')
>>> aacfaz10 = csd.molecule('AACFAZ10')
>>> aacfaz.set_coordinates(aacfaz10, atoms=zip(aacfaz.atoms[:10], aacfaz10.atoms[:10]))

Tidying up

After performing a set of edits it is possible that atom labels will have become duplicated. There is a method, ccdc.molecule.Molecule.normalise_labels() which will ensure uniqueness of labels.

>>> abfmpb.normalise_labels()

More generally we can sort the atoms of a molecule into a canonical order (at least if the molecule contains no unknown atoms or bonds). This method will normalise the labels of the atoms as well. This can be useful when wishing to compare two polymorphs, with the guarantee that the atoms will be in the same order. For example:

>>> hxacan = csd.molecule('HXACAN')
>>> hxacan26 = csd.molecule('HXACAN26')
>>> print(hxacan.atom('H1').neighbours[0])
Atom(C2)
>>> print(hxacan26.atom('H1').neighbours[0])
Atom(O1)
>>> hxacan.normalise_atom_order()
>>> hxacan.normalise_labels()
>>> print(hxacan.atom('H1').neighbours[0])
Atom(C4)
>>> hxacan26.normalise_atom_order()
>>> hxacan26.normalise_labels()
>>> print(hxacan26.atom('H1').neighbours[0])
Atom(C4)

It may be necessary to reset hydrogens, and to standardise the molecule’s bonds after a series of edits:

>>> abfmpb.assign_bond_types()
>>> abfmpb.add_hydrogens()

While most of the editing operations produce plausible geometries, it may be necessary to minimise the molecule after editing:

>>> from ccdc.conformer import MoleculeMinimiser
>>> minimiser = MoleculeMinimiser()
>>> minimised_mol = minimiser.minimise(abfmpb)

If a correct protonation state and correct bond-typing has been constructed for a molecule, formal charges for the atoms of the molecule can be assigned using:

>>> abfmpb.set_formal_charges()

The ccdc.crystal.Crystal.molecule is writable, so changes to the molecule of a crystal may now be written back to the crystal, and the changes will persist. Obviously this must be used with care, as the edited molecule may not form the same crystal as the original, but this is of considerable use in modelling crystal features. For example to investigate what other solvents might be compatible with a crystal structure one might remove existing solvent molecules and analyse the resulting void volume:

>>> abebuf = csd.crystal('ABEBUF')
>>> void_with_solvent = abebuf.void_volume()
>>> abebuf.molecule = abebuf.molecule.heaviest_component
>>> void_without_solvent = abebuf.void_volume()
>>> print('with %.2f, without %.2f' % (void_with_solvent, void_without_solvent))
with 0.00, without 28.69