How do you define chemical structure similarity?

This is done using a similarity algorithm using Symyx structural ‘keys’ as variables.

A compound/structure is said to be similar if certain parts of the structure may be replaced and the new structure's behavior is close to the starting structure. Chemical structure searching tools use different techniques to identifying a similar structures and the below information is meant to help you understanding of the method and its limitations.

Structure Similarity Searching in PharmaPendium:

The chemical structures of Drug compounds are registered by Elsevier MDL as so called 'keys'. Each key represents one of the following:

  • Structural feature, e.g. a chlorine atom
  • Six membered ring
  • Combination of rings or other functional groups
  • Part of a graph represented in that particular structure (sometimes structures are represented in a mathematical way as graphs)

The entire structure can be represented by a set of keys (Elsevier MDL has identified up to 960 structural keys) which serve as a bitmap or a fingerprint of the structure.

When a similarity search is performed the system translates the query structure into keys and compares this set of keys with the keys’ sets of those structures stored in the database. A structure in the database is notified as a hit if the comparison delivers more (or equal) identity as given by the % range in the query. For example, if you ask for 80% similarity, 80% or more of all keys of the query structure and the selected structure in the database have to be identical to mark the structure in the database as a hit.

While some of the keys describe very common structural entities like a Benzene ring others handle rare occurrences like the CN group. To make the similarity searches more efficient, Elsevier MDL put a weighting over the keys to reflect the occurrence frequency of each key. The weighting is developed out of Elsevier MDL’s Available Chemical Dictionary database (ACD), which contains several hundred thousands of chemical structures. Key comparison and key weighting together make the Elsevier MDL similarity search to a useful tool to identify chemical structures with a high probability of similar properties.

But nevertheless it is important to realize that although a structure might be similar based on the number of similar (chemical) keys, it doesn't mean that the biologically activity is the same since this depends on whether the keys that are found in the similar structures not only reflect the chemical properties but are related to the biological activity as well.

In general it is recommended to not lower the similarity search below 70%.

Literature for Elsevier MDL keys: Re-optimization of MDL Keys for Use in Drug Discovery, Joseph L. Durant,* Burton A. Leland, Douglas R. Henry and James G. Nourse, MDL Information Systems

PharmaPendium. Make informed drug development decisions.

Interested? Contact us today to find out more about how PharmaPendium could benefit you.

Demo or Quote

Would you like to find out how Pharmapendium’s unique benefits support you and your colleagues? Contact us now