Working with crystals

Introduction

The ccdc.crystal module contains the ccdc.crystal.Crystal class.

However, you are unlikely to use this class to create a crystal directly. You are much more likely to come across a crystal object after having read in a crystal from a file. Let us therefore import the ccdc.io module and read in the first crystal structure from the CSD. We will also import the utilities module for later use.

>>> from ccdc import io, utilities
>>> csd_reader = io.CrystalReader('CSD')
>>> first_csd_entry = csd_reader[0]

Accessing crystallographic properties

Let us have a look at the crystallographic properties available to us from a CSD crystal structure.

First of all it is worth noting that one can get access to the underlying ccdc.molecule.Molecule from the crystal.

>>> mol = first_csd_entry.molecule
>>> print(mol.identifier)
AABHTZ

Warning

If the crystal structure has no 3D information about the molecule’s atoms accessing the ccdc.crystal.Crystal.molecule will throw a TypeError. If one has access to the corresponding ccdc.entry.Entry the molecule can be constructed from that.

We can also get access to a molecular representation of the asymmetric unit. This, like other molecular representations of a crystal, may contain more than one component.

Note the asymmetric unit molecule may be smaller than the crystallographic molecule where the molecule sits on a crystallographic special position, for example an axis of symmetry or an inversion centre. For example the structure, KOVHUY, has a component that lies on a two-fold axis:

>>> c = csd_reader.crystal('KOVHUY')
>>> mol = c.molecule
>>> print([len(comp.atoms) for comp in mol.components])
[37, 6]
>>> asym_mol = c.asymmetric_unit_molecule
>>> print(len(asym_mol.components))
2
>>> print([len(comp.atoms) for comp in asym_mol.components])
[19, 6]

Some crystal structures have molecules which lie outside the unit cell, such as HXACAN01. These can be forced to lie within the unit cell using

>>> c = csd_reader.crystal('HXACAN01')
>>> c.centre_molecule()

If the molecule already lies within the unit cell, this method will not move the molecule. This is equivalent to Mercury’s ‘auto centre’ button.

The basic descriptors of a crystal structure are its cell lengths, cell angles and its space group.

>>> print("a: %.2f b: %.2f c: %.2f" % first_csd_entry.cell_lengths)
a: 11.37 b: 10.27 c: 7.36
>>> print("alpha: %.2f beta: %.2f gamma: %.2f" % first_csd_entry.cell_angles)
alpha: 108.75 beta: 71.07 gamma: 96.16
>>> print(first_csd_entry.spacegroup_symbol)
P-1
>>> print(' '.join("'%s'" %s for s in first_csd_entry.symmetry_operators))
'x,y,z' '-x,-y,-z'
>>> print(first_csd_entry.symmetry_operator_description('-x,-y,-z'))
Inversion at [0, 0, 0]
>>> print(first_csd_entry.symmetry_rotation('-x,-y,-z'))
(-1, 0, 0, 0, -1, 0, 0, 0, -1)
>>> print(first_csd_entry.symmetry_translation('-x,-y,-z'))
(0.0, 0.0, 0.0)

Note

The example above makes use of Python’s string formatting operator. For more information please see the Python string formatting documentation.

For convenience there are properties of the crystal to indicate whether a crystal’s spacegroup is a centrosymmetric or Sohncke spacegroup.

>>> c = csd_reader.crystal('MORHOR')
>>> c.is_centrosymmetric
True
>>> c.is_sohncke
False
>>> c = csd_reader.crystal('AADRIB')
>>> c.is_centrosymmetric
False
>>> c.is_sohncke
True
>>> c = csd_reader.crystal('AACRUB')
>>> c.is_centrosymmetric
False
>>> c.is_sohncke
False

There is a method to report which atoms lie on symmetry axes:

>>> crystal = csd_reader.crystal('FOHCOU')
>>> print(utilities.print_set(crystal.atoms_on_special_positions()))
{Atom(C1), Atom(C2), Atom(Cr1), Atom(O1), Atom(O2)}

The method allows the specification of a symmetry operator:

>>> print(utilities.print_set(crystal.atoms_on_special_positions('x,1/2-y,z')))
{Atom(C1), Atom(C2), Atom(Cr1), Atom(O1), Atom(O2)}
>>> print(utilities.print_set(crystal.atoms_on_special_positions('1/2+x,y,1/2-z')))
set()

and the identity gives all atoms:

>>> print(utilities.print_set(crystal.atoms_on_special_positions('x,y,z')))
{Atom(C1), Atom(C2), Atom(C3), Atom(C3G), Atom(C4), Atom(C4G), Atom(Cr1), Atom(O1), Atom(O2), Atom(O3), Atom(O3G), Atom(O4), Atom(O4G)}

It is easy to make a table by symmetry operator:

>>> print('\n'.join('%-17s %s' % (op, utilities.print_set(crystal.atoms_on_special_positions(op))) for op in crystal.symmetry_operators))
x,y,z             {Atom(C1), Atom(C2), Atom(C3), Atom(C3G), Atom(C4), Atom(C4G), Atom(Cr1), Atom(O1), Atom(O2), Atom(O3), Atom(O3G), Atom(O4), Atom(O4G)}
1/2-x,-y,1/2+z    set()
1/2+x,1/2-y,1/2-z set()
-x,1/2+y,-z       set()
-x,-y,-z          set()
1/2+x,y,1/2-z     set()
1/2-x,1/2+y,1/2+z set()
x,1/2-y,z         {Atom(C1), Atom(C2), Atom(Cr1), Atom(O1), Atom(O2)}

Other, basic crystallographic properties include the Z prime value, the Z value, the cell volume and the calculated density.

>>> first_csd_entry.z_prime
1.0
>>> first_csd_entry.z_value
2.0
>>> round(first_csd_entry.cell_volume, 2)
769.98
>>> round(first_csd_entry.calculated_density, 2)
1.53

Note

The example above makes use of Python’s built in round function. For more information please see the Python documentation.

Miller Indices

Miller indices describe a set of parallel planes through a crystal structure relative to its unit cell. These are defined by three integer values, conventionally called h, k and l, which represent an orientation with respect to the unit cell of the crystal. The Miller indices of a crystal structure may be obtained from the crystal, with a specification of the required indices:

>>> miller = first_csd_entry.miller_indices(1, 0, 0)

This defines a plane containing the b and c axes of the unit cell. The family of planes are all those parallel to this plane with translations integer multiples of the planar d_spacing:

>>> print(miller.plane)
Plane(normal=Vector(0.946, 0.102, -0.308), distance=10.757)
>>> print(round(miller.d_spacing, 3))
10.757

Miller indices in the API need not be relatively co-prime: for example:

>>> improper = first_csd_entry.miller_indices(2, 0, 0)

This will define a set of planes parallel to the first, with half the distance between them:

>>> print(round(miller.d_spacing, 3))
10.757
>>> print(round(improper.d_spacing, 3))
5.378

We can recover the former from the latter:

>>> print(improper.proper)
MillerIndices(1, 0, 0)
>>> print(improper.order)
2

For crystals that have been accessed from the CSD, one can also access the chemical name and formula of the crystal content.

>>> print(first_csd_entry.formula)
C13 H12 Cl2 N6 O2

However, if there is disorder in the crystal, the formula will take account of occupancies, and will sometimes give a formula with partial counts, for example:

>>> abegum = csd_reader.crystal('ABEGUM')
>>> print(abegum.formula)
C54 H65.32 N12 O23.66 Zn3

Let us illustrate that these would not be available for a crystal read in from, for example, a mol2 file.

>>> filepath = 'ABEBUF.mol2'

To get access to the crystal in this file we make use of a ccdc.io.CrystalReader.

>>> cryst_reader = io.CrystalReader(filepath)
>>> cryst_from_mol2 = cryst_reader[0]
>>> cryst_reader.close()
>>> print(cryst_from_mol2.formula)
C19 H15 N3 O2

A ccdc.io.CrystalReader also provides access to attributes such as cell volume, lengths and angles.

>>> round(cryst_from_mol2.cell_volume, 2)
3229.58
>>> print("a: %.2f b: %.2f c: %.2f" % cryst_from_mol2.cell_lengths)
a: 10.72 b: 10.90 c: 27.64
>>> print("alpha: %.2f beta: %.2f gamma: %.2f" % cryst_from_mol2.cell_angles)
alpha: 90.00 beta: 90.00 gamma: 90.00
>>> print(cryst_from_mol2.spacegroup_symbol)
Pbca

The mol2 file format specification for crystals does not include any information about the Z’ and Z value. These attributes consequently are set to None.

>>> print(cryst_from_mol2.z_prime)
None
>>> print(cryst_from_mol2.z_value)
None

Disordered Structures and Suppressed Atoms

Disordered structures display a lack of regularity. For example, each of the F atoms in a trifluoromethyl group, -CF3, might be randomly distributed between two sites. This means that the crystallographer will report two sets of coordinates for each F atom. In some cases, two alternative sites are occupied equally; in other cases, there is a major site and a minor site. Disorder can involve more than two sites and it can also involve a whole molecule. Disordered structures in the Cambridge Structural Database are treated in one of two ways. In older structures, only one position is kept for each disordered atom. When such a structure is read into the API, it will therefore appear as if it is not disordered at all. In more recent structures, all positions of disordered atoms are kept but only one set is connected by bonds to form a complete molecule.

In CSD structures one or more atoms may be suppressed. A common example is when the author reports two sets of atomic coordinates, for major and minor sites, in a disordered structure. In most cases the atoms of the minor site will be suppressed.

The properties ccdc.entry.Entry.molecule and ccdc.crystal.Crystal.molecule will suppress disordered atoms. The properties ccdc.entry.Entry.disordered_molecule and ccdc.crystal.Crystal.disordered_molecule may be used to retrieve the disordered atoms. These atoms will have labels ending with a ‘?’.

In the example below, some atoms are disordered over two sites, which means we can match the suppressed atoms of the disordered molecule with their counterparts. Note that the sum of their occupancies will be 1.0, and that the counterpart to the suppressed atom is that with the higher occupancy. In general atoms may be disordered over three or more sites. Also it should be noted that even in the absence of disorder, some atoms may show less than full occupancy, for example through the evaporation of a solvent.

>>> c = csd_reader.crystal('ABEGUM')
>>> print(c.has_disorder)
True
>>> norm_mol = c.molecule
>>> print(len(norm_mol.atoms))
159
>>> dis_mol = c.disordered_molecule
>>> print(len(dis_mol.atoms))
189
>>> suppressed_atoms = [a for a in dis_mol.atoms if a.label.endswith('?')]
>>> print(len(suppressed_atoms))
30
>>> O21 = dis_mol.atom('O21')
>>> O22 = dis_mol.atom('O22?')
>>> print(O21.occupancy)
0.64
>>> print(O22.occupancy)
0.36
>>> O31 = norm_mol.atom('O31')
>>> print(O31.occupancy)
0.79

The formula of a disordered molecule will take account of occupancies, giving fractional multiplicities for some structures, but a molecule without disorder will give the ususal integer valued multiplicities.

>>> print(dis_mol.formula)
C54 H65.32 N12 O23.66 Zn3
>>> print(norm_mol.formula)
C54 H66 N12 O24 Zn3

Crystallographic Contacts

Contacts, pairs of atoms closer in the crystal than the sum of their Van der Waals radii, can be identified by ccdc.crystal.Crystal.contacts():

>>> contacts = first_csd_entry.contacts()
>>> print(len(contacts))
15
>>> print(contacts[0])
Contact(Atom(N3)-Atom(H11))

Each contact is an instance of ccdc.crystal.Crystal.Contact which is a derived class of ccdc.molecule.Molecule.Contact. Please see Close Contacts for details of the base class. Additionally the crystal class supports methods pertaining to the crystal:

>>> contact = contacts[0]
>>> print(contact.intermolecular)
True

This contact has been determined to be between different instances of the asymmetric unit molecule. Its symmetry operators are available:

>>> print(' '.join("'%s'" % s for s in contact.symmetry_operators))
'x,y,z' 'x,y,-1+z'

Crystallographic Hydrogen Bonds

Analogously, the crystal supports the method ccdc.crystal.Crystal.hbonds() yielding a tuple of ccdc.crystal.Crystal.HBond instances, deriving from ccdc.crystal.Crystal.Contact.

>>> hbonds = first_csd_entry.hbonds()
>>> print(hbonds)
(HBond(Atom(O2)-Atom(H6)-Atom(N6)),)
>>> print(' '.join("'%s'" % s for s in hbonds[0].symmetry_operators))
'x,y,z' '-x,-1-y,-z' '-x,-1-y,-z'

The ccdc.crystal.Crystal.hbonds() takes a number of parameters to control exactly which contacts should be considered as a hydrogen bond. These include control over the length of the contact, whether the hydrogen atom must be present, and if so what angle range is permitted. More fine-grained control of the precise atom types involved in hydrogen bond formation is available through the hbond_criterion parameter. See HBond Criterion for details of this class.

Additionally there is a parameter, unique, which controls whether the crystallographically unique hydrogen bonds are returned (default) or whether the full set of hydrogen bonds is returned. In the latter case, the collection may include symmetry copies of hydrogen bonds. This is equivalent to what you would see if you displayed hydrogen bond contacts in Mercury. For example:

>>> hbonds = first_csd_entry.hbonds(unique=False)
>>> print(hbonds)
(HBond(Atom(O2)-Atom(H6)-Atom(N6)), HBond(Atom(N6)-Atom(H6)-Atom(O2)))
>>> for hb in hbonds:
...     print(' '.join("'%s'" % s for s in hb.symmetry_operators))
'x,y,z' '-x,-1-y,-z' '-x,-1-y,-z'
'x,y,z' 'x,y,z' '-x,-1-y,-z'

Manipulating crystals

The ccdc.crystal.Crystal.molecular_shell() function can be used to generate a packing shell around the molecules in a crystal or a subset of atoms of the molecules in the crystal.

See also

The API documentation ccdc.crystal.Crystal.molecular_shell() and the “Generating molecular shells” example in the cookbook documentation.

Contact and HBond expansion

The class ccdc.crystal.Crystal has methods to generate assemblies of molecules by expanding either HBonds or close contacts of the crystal. This works essentially as Mercury when clicking on all red contacts of the central molecule. The resultant assemblies are, perhaps, better motivated than the packing_shell (a fixed number of symmetry generated molecules) or a molecular_shell (a simple distance range from a set of atoms). The two methods take the same criterion-specifying arguments as the methods ccdc.crystal.Crystal.hbonds() and ccdc.crystal.Crystal.contacts().

>>> crystal = csd_reader.crystal('HXACAN')
>>> hbond_network = crystal.hbond_network()
>>> print(len(hbond_network.components))
5
>>> contact_network = crystal.contact_network()
>>> print(len(contact_network.components))
9
>>> contact_network = crystal.contact_network(intermolecular='intra')
>>> print(len(contact_network.components))
1
>>> large_hbond_network = crystal.hbond_network(repetitions=2)
>>> print(len(large_hbond_network.components))
13

Packing and Slicing

Two additional methods implement crystallographic packing and slicing in a manner consonant with that of Mercury:

>>> packed = crystal.packing()
>>> print(len(packed.components))
8

The ccdc.crystal.Crystal.packing() method takes two optional arguments: the size of the box to pack and an inclusion criterion. The box dimensions are a pair of triples of floats, being the near corner and far corner of the box in terms of multiples of the unit cell. The inclusion criterion is specified as a string, one of ‘CentroidIncluded’, ‘AllAtomsIncluded’, ‘AnyAtomIncluded’ or ‘OnlyAtomsIncluded’. The first three of these will expand to a full molecule if the inclusion criterion is met. The latter will yield only those atoms included in the box.

>>> packed = crystal.packing(((-2, -2, -2), (2, 2, 2)), 'CentroidIncluded')
>>> print(len(packed.components))
512
>>> packed = crystal.packing(((-2, -2, -2), (2, 2, 2)), 'AnyAtomIncluded')
>>> print(len(packed.components))
648

Crystals with unknown spacegroups will raise a runtime error indicating that the crystal is not packable.

The ccdc.crystal.Crystal.slicing() requires the provision of a slicing plane, and has optional arguments to control the size of the slicing plane and an inclusion criterion as above. The plane is any instance of ccdc.descriptors.GeometricDescriptors.Plane, for example a plane derived from the crystal’s ccdc.crystal.Crystal.MillerIndices or a the plane of best fit of a set of atoms.

>>> from ccdc.descriptors import MolecularDescriptors
>>> sliced = crystal.slicing(crystal.miller_indices(1, 1, 0).plane)
>>> print(len(sliced.components))
24
>>> sliced = crystal.slicing(crystal.miller_indices(1, 1, 0).plane, thickness=20, inclusion='AnyAtomIncluded')
>>> print(len(sliced.components))
84
>>> sliced = crystal.slicing(MolecularDescriptors.atom_plane(*crystal.molecule.heavy_atoms))
>>> print(len(sliced.components))
20

Polymer Expansion

Polymeric crystals may be expanded, as in Mercury, with the method ccdc.crystal.Crystal.polymer_expansion(). For example:

>>> crystal = csd_reader.crystal('ABALOF')
>>> print(len(crystal.molecule.atoms))
36
>>> mol = crystal.polymer_expansion()
>>> print(len(mol.atoms))
100

There is an optional repetitions argument to ccdc.crystal.polymer_expansion() to control the number of expansions performed:

>>> mol = crystal.polymer_expansion(repetitions=5)
>>> print(len(mol.atoms))
356

There is another keyword argument to ccdc.crystal.Crystal.polymer_expansion(), atoms, for the case where more control over the expansion is required. The atoms provided will be matched by label with the atoms of the crystal’s polymeric bonds, so the repetitions argument can still be honoured.

>>> original_mol = crystal.molecule
>>> atom_to_expand = original_mol.atom('I4C')
>>> mol = crystal.polymer_expansion(atoms=[atom_to_expand])
>>> print(len(mol.atoms))
68
>>> mol = crystal.polymer_expansion(atoms=[atom_to_expand], repetitions=5)
>>> print(len(mol.atoms))
196
>>> print(len([a for a in mol.atoms if a.label == 'Cu1A']))
0
>>> print(len([a for a in mol.atoms if a.label == 'Cu3A']))
0