Similarity searching¶

Introduction¶

In order to be able to set up searches we will need to import the ccdc.search module. Let us also import the ccdc.io module to allow us to read in and write out molecules.

>>> import ccdc.search
>>> import ccdc.io

As a preamble let us also set up a variable for a temporary directory and a file path to a testosterone molecule.

>>> import tempfile
>>> tempdir = tempfile.mkdtemp()
>>> filepath = 'testosterone.mol2'

To get access to the molecule in the testosterone mol2 file we make use of a ccdc.io.MoleculeReader.

>>> reader = ccdc.io.MoleculeReader(filepath)
>>> testosterone = reader[0]

Similarity search¶

To run a similarity search one must first create a ccdc.search.SimilaritySearch whose initialiser takes a ccdc.molecule.Molecule and a similarity threshold between 0.0 and 1.0. By default the similarity threshold is set to 0.7.

>>> similarity_query = ccdc.search.SimilaritySearch(testosterone)

The similarity search can then be run by making use of the search() function.

>>> sim_hits = similarity_query.search()
>>> print(len(sim_hits))
837

To reduce the number of hits we can increase the similarity threshold.

>>> similarity_query.threshold = 0.9
>>> sim_hits = similarity_query.search()
>>> print(len(sim_hits))
84

An alternative approach to reducing the number of hits is to constrain the number of hits to return.

>>> sim_hits = similarity_query.search(max_hit_structures=10)
>>> print(len(sim_hits))
10

Let us find out what these structures are and what their similarity to the query is.

>>> for hit in sim_hits:
...     print('%9s: %.2f' % (hit.identifier, hit.similarity))
   BAWMAN: 1.00
   BEJVAN: 1.00
   BOKVUS: 1.00
   CERVAX: 1.00
   DIGRIV: 1.00
   EFEJAD: 1.00
   EPITES: 1.00
   GIXVIW: 1.00
   GIXVOC: 1.00
   HANSTO: 1.00

Similarity searches allow all forms of search filter given by ccdc.search.Search.Settings. See Search filters for examples of use.

Similarity queries allow all the forms of search that a ccdc.search.SubstructureSearch does.

Similarity searching¶

Introduction¶

Similarity search¶

Table of Contents

Previous topic

Next topic