Release notes

Overview

The Cambridge Structural Database (CSD) is a highly curated and comprehensive repository of organic and organo-metallic crystal structures and is an essential resource to scientists around the world.

The Cambridge Structural Database Portfolio (CSD Portfolio) is a powerful and highly flexible suite of software components and structural knowledge-bases. The CSD Portfolio enables exploration and application of the knowledge contained within more than a million curated crystal structures.

The CSD Portfolio enables scientists to work with structural data to extract new insights. This includes public and proprietary, experimental, and predicted data. Our software supports scientific discovery, development, and analysis, and is trusted by thousands across industry and academia.

The CSD Python API has been developed to make the CSD Portfolio data and functionality accessible in a programmatic fashion. It facilitates integration with in-house work-flows and 3rd party applications. In addition, the CSD Python API can be used to perform activities not currently possible through the graphical interfaces. It is a platform for innovation.

Searchable documentation is available on at https://www.ccdc.cam.ac.uk/docs/csd_python_api/

Any feedback on the CSD Python API may be sent to support@ccdc.cam.ac.uk.

Citing the CSD Python API

When publishing works that benefited from the CSD Python API, please consider using the following citation:

“The Cambridge Structural Database”
C. R. Groom, I. J. Bruno, M. P. Lightfoot and S. C. Ward, Acta Crystallographica Section B, B72, 171-179, 2016

For further citation advice, refer to your download agreement and visit https://www.ccdc.cam.ac.uk/support/product_references/

Licensed Features

Some features are conditionally available depending on the user’s CSD licence.

API Module

CSD-Core

CSD-Materials

CSD-Discovery

CSD-Theory

CSD-Particle

IO

y

y

y

y

y

Entry

y

y

y

y

y

Crystal*1

y

y

y

y

y

Molecule

y

y

y

y

y

Search

y

y

y

y

y

Descriptors*2

y

y

y

y

y

Diagram

y

y

y

y

y

Conformer*3

y

y

y

y

y

Interaction*4

y

y

y

y

y

Protein

y

y

y

y

y

Utilities

y

y

y

y

y

Solid Form

y

Morphology

y

Screening

y

Docking

y

Cavity

y

Pharmacophore

y

CSP*5

y

Particle

y

*1 The crystal packing similarity API is available to CSD-Materials users.

*2 The powder pattern simulation and comparison API, Morphology API, HBond Coordination API, and HBond Propensity API are available to CSD-Materials users. Habit API is available to CSD-Particle users.

*3 The Conformer Generation API is available to CSD-Materials and CSD-Discovery users, as well as some stand-alone packages.

*4 The Interaction Map Analysis API is available to CSD-Materials and CSD-Discovery users.

*5 Landscape Generator is available as an add-on to CSD-Materials, CSD-Discovery and CSD-Enterprise users, as well as some stand-alone packages.

CSD-Enterprise users have access to the combined CSD-Discovery and CSD-Materials feature sets.

Change Log

3.1.0

Minor new features

  • New methods to add or remove hydrogens added to ccdc.crystal.Crystal. By adding hydrogens through the Crystal interface, a user will get more plausible positions for flexible hydrogens that make hydrogen bonds within the crystalline environment where possible.

Significant improvements

3.0.18

Minor new features

  • ccdc.io.EntryWriter and related file writers can be created with a format='cif' or format='mmcif' parameter to determine which of CIF or mmCIF formats should be written to a “.cif” file.

Bug fixes

  • If no format parameter is specified, “.mol2” files read and rewritten as “.cif” files are now written with CIF format, unless the data was clearly protein structure in which case mmCIF format is used.

  • Fixed issues related to missing files in some circumstances when doing hydrogen bond propensities calculation and interaction map analysis.

3.0.17

Major new features

Minor new features

3.0.16

Major new features

Bug fixes

  • fixed Labels on SMARTS dont work with recursive SMARTS.

3.0.15

Major new features

Minor new features

Bug fixes

  • R/S chirality flags are now assigned correctly to sulfoxides read from SMILES.

3.0.14

Minor new features

3.0.13

Major new features

Significant improvements

  • The SMARTS parser in ccdc.search has been significantly upgraded. ccdc.search.SMARTSSubstructure now:
    • Supports recursive SMARTS.

    • Supports component level SMARTS.

    • Supports the vast majority of atomic primitives.

    • Supports a more complex bond logic allowing for AND/OR combinations.

    • Supports exotic hydrogen atoms to allow, for example, for bridging hydrogen searching.

    • Has been made more conformant to the standard definition to remove atomic ambiguities.

    • Has had several bugs fixed (for example ring closures are properly supported now.)

    • Has had some speed improvements in the resultant searches.

For more information see the descriptive documentation on SMARTS

Deprecations

3.0.12

Major new features

Minor new features

Bug fixes

  • an off-by-one error in cavity atom definitions when reading & writing gold configuration files has been fixed. This would manifest itself as an out-of-range exception if the last atom in an input protein file was included as part of the cavity.

3.0.11

Minor new features

  • The powder pattern simulation class ccdc.descriptors.CrystalDescriptors.PowderPattern now supports simple preferred orientation corrections in line with the graphical functionality in Mercury

  • Support for FIMs on surface calculations. See the ccdc_addopt package for details.

  • Improved performance and resolution of opt-in telemetry.

  • Added feature to simulate preferred orientation in powder patterns, matching existing feature in Mercury

Bug fixes

  • Fixed issues by protecting CIF opening against things which are not CIF

3.0.10

  • Limited release for CSPC-IP consortium. CSP features added.

3.0.9

Major new features

  • Experimental support for Python 3.8 and 3.9 added

  • A new ccdc.io.Subsets class as a simple way to access pre-defined CSD subsets.

Minor new features

  • Removed six requirement

  • Hydrogen addition to ccdc.molecule.Molecule can now be siteless

  • Solvent accessible area calculations in ccdc.molecule.Molecule can calculate the accessible surface area in contact with another molecule only

3.0.8

Major new features

  • LD_LIBRARY_PATH, DYLD_LIBRARY_PATH and DYLD_FRAMEWORK_PATH do not need to be set during installation anymore. The only exception is LD_LIBRARY_PATH may be required for CentOS 7.

  • Substructure searches now work for structures with no coordinates.

  • Hydrogen Bond Statistics features added. See ccdc.interaction.HydrogenBondStatistics

3.0.7

Major new features

Minor new features

3.0.5

Major new features

  • Support for macOS 11 Big Sur.

  • XQuartz is no longer a dependency for macOS.

3.0.4

Major new features

  • Remove requirement for an accessible X server on Linux.

Bug fixes

  • Fixed diagram generator gives blank diagrams for some molecules.

  • Fixed docking API fails on Windows trying to write files with ‘:’ in file name.

  • Improved histogram binning for torsion fragments

3.0.3

Major new features

  • The CSD Python API will now operate, where appropriate, in the absence of the CSD data.

  • Remove requirement for an accessible X server on Linux.

Minor new features

  • Raise exception when accessing out-of-bounds grid points in grid API.

  • Use a fixed range for torsion histograms during geometry analysis.

  • Fix import of attributes from SDFile and Mol2 files containing structures with non-unique identifiers.

  • Fix export of Mol2 files for structures with no attributes.

  • Fix syntax of CCP4 files.

  • Fix detection of intermolecular hydrogen bond pairs as observed if between molecules within the asymmetric unit.

Examples

  • hydrogen_bond_propensity_report.py - Write Hydrogen bond propensities calculation in docx report

  • multi_component_hydrogen_bond_propensity_report.py - Write multi-component hydrogen bond propensities calculation in docx report

3.0.2

Minor new features

Backwards incompatible changes

  • Python 2.7 was officially at its end-of-life on 1st January 2020. New versions of the CSD Python API no longer support Python 2.7.

3.0.1

Minor new features

3.0.0

Minor new features

  • new ccdc.utilities.ApplicationInterface to make interfacing the CSD Python API with external applications easier.

  • new ccdc.utilities.HTMLReport class for simplifying HTML reports for Mercury and Hermes API scripts.

  • there are methods on a ccdc.search.SubstructureSearch.SubstructureHit to allow the definitions and atoms of a the measurements, constraints and geometric objects of a hit to be inspected.

  • there is a method, ccdc.cavity.Cavity.write() which will write a Hermes-compatible rlbcoor file from a cavity if the cavity has been read from a PDB file.

  • there is a new static method, ccdc.protein.Protein.known_cofactor_codes() that provides a list of cofactor 3-letter codes that are recognised by the python API

  • ccdc.search.QueryAtom.label_match can be set to a regular expression. This will be used to constrain a search for an atom to only hit the atoms that match it.

  • an exception is no longer raised on opening a database with an identifier list containing identifiers missing from the database, instead an exception will be raised when missing entries are accessed.

Deprecations

Additional Examples

  • cavity_pair_view.py - a simple new demonstration script that shows how to superimpose pairs of proteins based on their cavities.

2.3.0

Major new features

Minor new features

2.2.0

Major new features

Minor new features

2.1.0

Minor new features

Bug fixes

2.0.0

Backwards incompatible changes

  • ASER format databases are no longer supported. If you have databases that do not now work with the CSD Python API, you can contact your local in-house database manager, or you can contact support@ccdc.cam.ac.uk to receive assistance in converting your files to the new format.

  • Creation of SubstructureSearch and SimilaritySearch screens for speeding up searches of large, non-CSD databases is no longer supported. Please convert your databases to csdsql format databases where the screens are built-in.

  • ccdc.io.EntryReader when created with a list of identifiers will raise a RuntimeError if any of the identifiers is not present in the underlying database.

Deprecations

  • ccdc.MolecularDescriptors.overlay_rmsd_and_rmsd_tanimoto() has been deprecated and replaced with the method ccdc.MolecularDescriptors.overlay_rmsds_and_transformation(), which also returns the overlay transformation matrix.

Major new features

Minor new features

Bug fixes

  • fix uninstallation of conda package.

  • improve detection of GOLD executable at less standard locations.

  • recognise cofactor atoms as part of the protein.

  • improve compatibility with PyQt module.

  • accept unicode identifiers in GCD lists.

  • allow uppercase filenames with lower-case list of pdb codes (JIRA GOLD-1082) when creating a cavity database with a identifier list filter.

1.5.3

Backwards incompatible changes

Minor new features

  • ccdc.molecule.Molecule.normalise_atom_positions() allows a molecule to reorder its atoms canonically.

  • ccdc.docking.Docker.Settings now allows the specification of a scoring parameter file and a torsion distribution file.

  • API tests are dependent on a specific CSD version being present, that may not match what the user has, may be skipped

  • Molecule and crystal formulas of disordered structures use occupancies giving fractional element counts

  • ccdc.__build__() provides a unique build identifier.

Bug fixes

  • ccdc.search.SubstructureSearch.SearchHit.match_substructures() has been revised to ensure the order of matched atoms in the returned molecule is preserved accords with the order of substructure atoms in the query.

  • Fixed problem that on 64-bit Windows due to modified registry layout, a 32-bit API can have trouble locating the database and therefore a licence

1.5.2

Backwards incompatible changes

  • the method ccdc.molecule.add_hydrogens() will no longer add hydrogens to atoms in a polymeric bond, correcting an earlier error.

Minor new features

  • there is a new method, ccdc.molecule.Atom.is_in_line_of_sight() which checks whether a pair of atoms is occluded by a third. See line-of-sight for details.

  • ccdc.descriptors.StatisticalDescriptors.RankStatistics additionally accepts string identifier to specify activity classifications.

  • All search classes raise exceptions when from_xml_file methods called with non-existent XML file.

  • the method ccdc.utilities.Timer.progress() now takes an optional argument specifying that the output should be written in place, overwriting previous output.

  • There is a new property, ccdc.search.Search.SearchHit.identifier() to access the string identifier for the hit.

  • Fail earlier (at import time) for bad installations with helpful error messages for: - Python versions other than 2.7.x - 64-bit install into 32-bit Python environment, or vice versa. - UCS2 install into UCS Python environment, or vice versa.

Bug fixes

  • Fix diagram generation segmentation fault on Linux platforms with NVidia graphics drivers (BZ17616)

  • Fixed bug that multiple covalently-bound ligands with the same chain identifier were treated as a single ligand (Bug 18737)

  • Avoid rounding errors when comparing crystal contact lengths.

  • Fix incorrect licensing restrictions for Point and Vector.

  • Throw a runtime_error instead of seg faulting when incorrect atoms are provided for a substructure search centroid

  • Throw a runtime_error instead of seg faulting when illegal substructure indices specified in search

  • Give consistent errors when search XML files are missing, irrespective of Search class.

  • Fix stretching of grid files when saved using .grd format (BZ18734)

  • Fix ccdc.search.SMARTSSubstructure hit atom indexes don’t correspond to the substructure specification (Bug 18661)

  • Fix calculation of crystal density where the Z’ is less than 1 (BZ16116)

  • FIx distance parameters read from CONNSER files were erroneously VdW-corrected (BZ18736)

1.5.1

Backwards incompatible changes

  • the module ccdc.cavity has been moved to ccdc_rp.cavity.

  • when reading loops from the attributes of a CIF file, values which are deemed not significant will be replaced by None. Previously only the first item of the loop was checked.

Minor new features

Bug fixes

  • “se” to represent aromatic Selenium (as in Selenophene) is now supported by the SMILES and SMARTS parsers.

  • :meth:’descriptors.MolecularDescriptors.point_group_analysis’ failed if called twice on same molecule.

  • :meth:’descriptors.MolecularDescriptors.point_group_analysis’ now deduces correct point group symmetry for BROFRM02.

1.5.0

Major new features

Minor new features

1.4.0

Backwards incompatible changes

  • some of the names of ccdc.interaction.InteractionLibrary.CentralGroup have been changed to correct a bug in the handling of groups with distinct geometries, for example, ‘planar uncharged aromatic amino’ is now distinguishable from ‘pyramidal uncharged aromatic amino’.

Deprecations

  • The attribute ccdc.conformer.ConformerSettings.normalised_score_threshold is deprecated.

Major new features

Minor new features

Bug fixes

  • Writing a protein containing atoms with unknown element types to PDB formatted-files now generates valid PDB format.

  • Reading proteins from Mol2 files now retains residue information.

  • There was a bug where ccdc.search.SubstructureSearch.SubstructureHit.match_atoms() could return atoms not in the same molecule. This has been fixed.

  • The python Numpy module is no longer a prerequisite for the CSD Python API, leading to an easier installation experience for pip users.

1.3.0

Backwards incompatible changes

  • The ccdc.entry.Citation previously had a member, ccdc.entry.Citation.journal_name which has been superseded by an instance of the new ccdc.entry.Journal.

  • The ccdc.descriptors.PowderPattern has been removed, after deprecation in the previous release. Please use ccdc.descriptors.CrystalDescriptors.PowderPattern instead.

  • Several methods and classes from ccdc.cavity have been removed following deprecation in the previous release.

  • When reading a CIF file, bond information, even if present, will be ignored. This is for consistency with other CCDC programs.

Major new features

Minor new features

Bug fixes

1.2.0

Deprecations

  • ccdc.descriptors.PowderPattern has been moved into a new namespace, and now appears as ccdc.descriptors.CrystalDescriptors.PowderPattern. It is available under its old location for this release, for backwards compatibility, but will be available only in its new location for the next 1.3 release.

Major new features

  • There is now an implementation of Graph Sets in the API. See graph-sets for details.

  • ccdc.docking.Docker now allows GOLD to be invoked for rescoring. See Rescoring for details.

Minor new features

Bug fixes

  • ccdc.protein.Protein.Residue.__eq__() method now compares chain ID as well as residue sequence number.

  • ccdc.protein.Protein.Residue.__lt__() method now sorted on chain ID as well as residue sequence number.

1.1.1

Deprecations

  • ccdc.cavity.Cavity.RapmadPocket has been deprecated in favour of ccdc.cavity.Cavity.PocketDistanceHistograms.

  • ccdc.cavity.Cavity.PocketDistanceHistograms.identifier, ccdc.cavity.Cavity.PocketDistanceHistograms.nfeatures and ccdc.cavity.Cavity.PocketDistanceHistograms.feature_coordinates have been deprecated.

  • ccdc.cavity.Cavity.rapmad_pocket() has been deprecated in favour of ccdc.cavity.Cavity.pocket_distance_histograms().

  • ccdc.cavity.CavityDB.rapmad_pocket() and ccdc.cavity.CavityDB.rapmad_pockets() have been deprecated in favour of ccdc.cavity.CavityDB.pocket_distance_histograms() and ccdc.cavity.CavityDB.pocket_distance_histogram_sets().

  • ccdc.cavity.CavityDB.search_rapmad() and ccdc.cavity.CavityDB.search_cavbase() have been deprecated in favour of ccdc.cavity.CavityDB.pocket_search() and ccdc.cavity.CavityDB.cavbase_search().

Minor new features

  • updated naming in experimental interface to protein cavities. See ccdc.cavity.

1.1.0

Backwards incompatible changes

  • molecules read from a database no longer raise an exception if there are no atoms.

  • molecules with no atoms can be written, converted to strings

  • entries and crystals can be created with no underlying atoms

  • translating molecules no longer raise an exception if there are siteless atoms.

Major new features

Molecule
  • ccdc.molecule.Molecule now provides a method to calculate partial charges for organic molecules.

  • ccdc.molecule.Atom has a property, partial_charge, to get or set the partial charge of an atom. All partial charges of a molecule will be reset if an atom’s formal charge is changed.

Crystal

Minor new features

1.0.0

Major new features

  • updated for CSDS 2016

  • there is now an API for docking ligands into proteins, using GOLD. This is currently available only to collaborators.

  • ccdc.molecule.Molecule can now determine intramolecular hydrogen bonds and close contacts.

  • ccdc.molecule.Atom.is_chiral and ccdc.molecule.Atom.chirality have been extensively revised to give more accurate determination of R/S chirality including the determination of para-chiral centres (whose chirality is determined solely by the chirality of other atoms). Note that structures with pi-bonds will not support the determination of chirality.

  • ccdc.descriptors.MolecularDescriptors has new methods to define geometric objects from atom and ring coordinates.

  • ccdc.descriptors.GeometricDescriptors is new and provides methods to define vectors and planes from points, and to calculate geometric relationships between them. See Molecular geometry for details.

Minor new features

Entry
  • ccdc.entry.Entry.cross_references gives a tuple of ccdc.entry.Entry.CrossReference instances. These provide cross-references between entries of the CSD.

  • a ccdc.io.EntryReader of a mol2 format entry will now extract SDFile-like tags from the Mol2Comments and place them in an attributes dictionary. The EntryWriter will write these attributes.

  • a ccdc.io.EntryReader of a mol2 format entry will now extract Mol2 format atom sets and place them in a dictionary attribute ccdc.entry.Entry.atom_sets where found. The EntryWriter will write atom sets if the above attribute is set.

Crystal
Molecule
IO
  • GCD files may now use an arbitrary database as the source of entries. Use the form io.EntryReader(gcd_file, source_database). This will also work with lists of identifiers.

0.7.0

Deprecations

  • The property ccdc.entry.jds_deposition_number is deprecated. This is a historical journal deposition number and has since been superseded by CCDC numbers. The method ccdc.search.TextNumericSearch.add_jds_deposition_number() is similarly deprecated.

Major new features

Minor new features

Entry
Crystal
IO
  • res format (SHELX) files may now be read and written through the standard io classes.

Search
  • There is a method for writing ConQuest to Mercury interchange files using ccdc.search.SubstructureSearch.SubstructureHitList.write_c2m_file(). This allows substructure search results to be visualised in the data analysis module of Mercury.

  • A substructure search hit has a new method, ccdc.search.SubstructureSearch.SubstructureHit.match_symmetry_operators() which will return the symmetry operators applied to atoms to perform the match.

  • There is a new method, ccdc.search.Search.Settings.test() which will determine whether an ccdc.entry.Entry, ccdc.crystal.Crystal or ccdc.molecule.Molecule satisfies the requirements of the Settings class.

  • ccdc.search.Search.Settings.no_disorder now may take any of three values: None (or anything which evaluates to False) to indicate no filtering of any disordered structures; ‘all’ to indicate filtering of structures with any disordered atoms; ‘Non-hydrogen’ or any string apart from ‘all’ to indicate filtering of structures with heavy atom disorder. This last is compatible with the ConQuest ‘no disorder’ selector.

GeometryAnalyser
Screener
Examples
  • maximum_common_substructure.py shows a similarity search followed by maximum common substructure search.

  • filter_csd.py shows how an iteration over the entries of the CSD can omit entries on a number of criteria.

  • simple_report.py shows how python’s format() method may be used to write HTML reports on a CSD entry.

0.6.0

Backwards incompatible changes

Deprecations

Major new features

Minor new features

Geometry Analyser
Diagrams
Search
  • ccdc.search.TextNumericSearch has a new property, queries, which will give a human-readable representation of the queries added to the search instance.

Entry
  • ccdc.entry.Entry has the property radiation_source to express the experimental radiation probe used in the determination of the crystal.

  • ccdc.entry.Entry has the property is_polymeric to allow filtering of the CSD on polymeric structures.

  • ccdc.entry.Entry and ccdc.crystal.Crystal may now be compared for equality (based on its identifier) and hashed, allowing use as a key in a dictionary or set.

Molecule
Crystal
  • ccdc.crystal.Crystal now has a spacegroup_number_and_setting attribute to provide details of the crystal’s space group.

  • ccdc.crystal.Crystal has a new method to calculate a packing shell of a given number of molecules.

  • ccdc.crystal.Crystal now reports its symmetry operations as a tuple of strings.

  • ccdc.crystal.Crystal will report rotational and translational components of its symmetry operators.

  • ccdc.crystal.Crystal now allows the generation of molecules generated by the crystal’s symmetry operations.

0.5.0

Backwards incompatible changes

Deprecations

Major new features

Minor new features

Bug fixes

0.4.0

Major new features

  • Access to the CCDC conformer generator and molecular minimiser. See the ccdc.conformer module. Feature under development - currently available only to associated collaborators.

  • Access to the CCDC diagram generation functionality. See the ccdc.diagram module.

  • Mogul analysis of individual fragments:

    • ccdc.mogul.Mogul.analyse_bond()

    • ccdc.mogul.Mogul.analyse_angle()

    • ccdc.mogul.Mogul.analyse_torsion()

    • ccdc.mogul.Mogul.analyse_ring()

  • The entire CSD Python API is now unicode compatible

Minor new features

Bug fixes

  • Fixed a bug which meant that the ccdc.mogul.MogulResult.histogram() function was not returning correct data.

0.3.1

Backwards incompatible changes

Major new features

  • Support for 64-bit Python on Linux

  • ccdc.search.SubstructureSearch has expanded functionality

    • Ability to measure distances, angles and torsion angles in hit structures

    • Ability to constrain distances, angles and torsion angles in hit structures

    • It now provides the ability to add more than one substructure, which can be used to set up inter-molecular contact searches

  • A number of IsoStar classes have been implemented. See ccdc.isostar for details.

Minor new features

Bug fixes

  • There is a much larger number of open ASER database instances provided, and a memory leak of open ASER database instances has been fixed.

  • ccdc.search.ConnserSubstructure: will raise an exception for a missing or empty file name parameter.