December 21, 2024, 11:23:32 AM
Forum Rules: Read This Before Posting


Topic: canonical SMILES?  (Read 11472 times)

0 Members and 1 Guest are viewing this topic.

Offline Borek

  • Mr. pH
  • Administrator
  • Deity Member
  • *
  • Posts: 27885
  • Mole Snacks: +1815/-412
  • Gender: Male
  • I am known to be occasionally wrong.
    • Chembuddy
canonical SMILES?
« on: February 03, 2006, 12:23:57 PM »
Anybody proficient in SMILES?

Is there something like canonical form of SMILES?

CC(CC)CCC and CCCC(C)CC and CC(CCC)CC are all correct and depict the same compound. Is one prefreed over another, or not? Chmoogle prefers CCCC(C)CC, probably due to the fact that this is the longest chain.

InChI uses some canonicalization of the structure to generate unique and unambiguous string for any compound.
ChemBuddy chemical calculators - stoichiometry, pH, concentration, buffer preparation, titrations.info

cjames53

  • Guest
Re:canonical SMILES?
« Reply #1 on: February 03, 2006, 12:50:56 PM »
Anybody proficient in SMILES?

Is there something like canonical form of SMILES?

CC(CC)CCC and CCCC(C)CC and CC(CCC)CC are all correct and depict the same compound. Is one prefreed over another, or not? Chmoogle prefers CCCC(C)CC, probably due to the fact that this is the longest chain.

InChI uses some canonicalization of the structure to generate unique and unambiguous string for any compound.

There is, but it's never been fully published.  Weininger et al published a paper on canonicalization, but it wasn't sufficient for two people to implement the exact same algorithm, and Daylight has had to make a number of improvements since the paper was published.  OpenEye has a SMILES canonicalizer in their OEChem toolkit.  I believe several others do, too.

There's no way to canonicalize SMILES "by hand."  It's a computationally difficult problem.

Craig

   

Offline Borek

  • Mr. pH
  • Administrator
  • Deity Member
  • *
  • Posts: 27885
  • Mole Snacks: +1815/-412
  • Gender: Male
  • I am known to be occasionally wrong.
    • Chembuddy
Re:canonical SMILES?
« Reply #2 on: February 03, 2006, 01:16:39 PM »
Thanks.

There's no way to canonicalize SMILES "by hand."  It's a computationally difficult problem.

Never planned to :)

I am thinking on the dissociation/complexation constants database for my programs and I need some form of compound identification for user interface. In case of inorganic compounds that's not a problem, standard notation is in most cases good enough. But organic part is a horror.

Note I am heavily underfunded one person bussines :)
« Last Edit: February 03, 2006, 01:16:57 PM by Borek »
ChemBuddy chemical calculators - stoichiometry, pH, concentration, buffer preparation, titrations.info

cjames53

  • Guest
Re:canonical SMILES?
« Reply #3 on: February 03, 2006, 05:18:09 PM »
I am thinking on the dissociation/complexation constants database for my programs and I need some form of compound identification for user interface. In case of inorganic compounds that's not a problem, standard notation is in most cases good enough. But organic part is a horror.

Try InChI instead.  It's the next-generation SMILES (created/supported by NIST), and there is free public-domain software that does canonicalization.  It serves the same purpose for which  canonical SMILES was conceived by Dave Weininger, but they've addressed some of the problems that plague SMILES, such as standardizing representation for tautomers, nitros, and many other problems.

You can get InChI from NIST, or as part of the recent  OpenBabel 2.0 release (on SourceForge).

Craig

Offline Borek

  • Mr. pH
  • Administrator
  • Deity Member
  • *
  • Posts: 27885
  • Mole Snacks: +1815/-412
  • Gender: Male
  • I am known to be occasionally wrong.
    • Chembuddy
Re:canonical SMILES?
« Reply #4 on: February 03, 2006, 06:06:46 PM »
Try InChI instead.

What I don't like about the idea is that to enter correct InChI user must know it from some other source, while correct (although not canonicalized) SMILES can be entered by hand - I have learnt in enough to be able to search chmoogle in about half an hour. Then using InChI my database will be very simple to implement - putting some additional burden on user. Using SMILES it will be easy for users - but very heavy on me (and probably amount of work necessary will never pay off).

Tough decision, as I always wanted to write software as easy for users as possible.
ChemBuddy chemical calculators - stoichiometry, pH, concentration, buffer preparation, titrations.info

cjames53

  • Guest
Re:canonical SMILES?
« Reply #5 on: February 03, 2006, 09:13:37 PM »
What I don't like about the idea is that to enter correct InChI user must know it from some other source, while correct (although not canonicalized) SMILES can be entered by hand - I have learnt in enough to be able to search chmoogle in about half an hour. Then using InChI my database will be very simple to implement - putting some additional burden on user. Using SMILES it will be easy for users - but very heavy on me (and probably amount of work necessary will never pay off).

Entering SMILES by hand is a heroic effort and gets "out of hand" for larger molecules, which is why we use JME.

Here's what I'd do.  Use JME (see http://www.molinspiration.com/jme/), the examples are excellent.  Then, get OpenBabel 2.0, which can translate just about anything to anything.  Use it to convert the SMILES to InChI "behind the covers", and use the InChI's instead of canonical SMILES for the key to your database.  You can also store the SMILES in the database, so the user never has to see the InChI strings.

Craig

Offline Borek

  • Mr. pH
  • Administrator
  • Deity Member
  • *
  • Posts: 27885
  • Mole Snacks: +1815/-412
  • Gender: Male
  • I am known to be occasionally wrong.
    • Chembuddy
Re:canonical SMILES?
« Reply #6 on: February 22, 2006, 03:19:01 PM »
I did some research and found that free version of ChemSketch works perfect too. As there are two free and independent tools that can be used for easy InChI generation it seems that anyone can do it it reasonably short time. Thus it will be hardly a problem for anybody.
« Last Edit: February 22, 2006, 03:30:45 PM by Borek »
ChemBuddy chemical calculators - stoichiometry, pH, concentration, buffer preparation, titrations.info

Offline Bronwen Dekker

  • Regular Member
  • ***
  • Posts: 54
  • Mole Snacks: +6/-0
  • Gender: Female
    • Nature Protocols
Re: canonical SMILES?
« Reply #7 on: February 15, 2007, 03:34:44 PM »
I realise that the last post on this topic was about a year ago, so I might be adding information that is common knowledge! I noticed (only) recently that you can 'get' the SMILES/SMARTS or InChI, by drawing the structure in the 'sketch' function of PubChem's structure search.

http://pubchem.ncbi.nlm.nih.gov/search/

For some reason this pleased me...
There is no problem involved in becoming your own father or mother that a broadminded and well-adjusted family can't cope with. -Douglas Adams

I blog here and have started a collection of "protocols in boxes".

I work at Nature Protocols.

Sponsored Links