11

I would like to know if there is a proper way to get the 3D information from the SMILES string of a molecule.

  1. Is there a standard way to embed a SMILES string in 3D space?
  2. Are there other representations of compounds which include their spatial information too?
0x90
  • 924
  • 6
  • 20

1 Answers1

14

SMILES is insufficient

SMILES strings do not encode 3D structure information. They only convey atom type, connectivity and bond types. InChI is like SMILES in this regard.

Thus, you will need either (a) an algorithm to infer or guess a plausible 3D conformation of a molecule or (b) a file type that has already specified the 3D arrangement of the molecule.

File types for storing, reading, and showing 3D conformations

Probably the most standard way to represent the 3D conformation of a molecules is with a *.mol file. There are many tools to read such files. You can read more about the format on Wikipedia.

Estimating a conformation from SMILES

You can also use computational tools to estimate a 3D conformation from a SMILES string. Note I say a conformation rather than the conformation; molecules can in general have many valid conformations. Also, tools for generating conformations rely on molecular force fields, etc. These have many implicit assumptions; there is no guarantee that a computationally generated conformation will be the real conformation of a real molecule in the real world.

Here is some code for generating a plausible conformation from a SMILES string using rdkit.

from rdkit import Chem
from rdkit.Chem import AllChem
from rdkit.Chem import Draw
from rdkit.Chem.Draw import IPythonConsole

my_mol = Chem.MolFromSmiles('NC(=N)N1CCC[C@H]1Cc2onc(n2)c3ccc(Nc4nc(cs4)c5ccc(Br)cc5)cc3')

my_mol

my_mol_with_H=Chem.AddHs(my_mol)

AllChem.EmbedMolecule(my_mol_with_H)
AllChem.MMFFOptimizeMolecule(my_mol_with_H)

my_embedded_mol = Chem.RemoveHs(my_mol_with_H)

my_embedded_mol

print(Chem.MolToMolBlock(my_embedded_mol))     

The printed result is:

    RDKit          3D

 33 37  0  0  0  0  0  0  0  0999 V2000
   -8.0789   -0.7261   -1.9565 N   0  0  0  0  0  0  0  0  0  0  0  0
   -8.3618   -0.9375   -0.6556 C   0  0  0  0  0  0  0  0  0  0  0  0
   -9.4453   -1.5737   -0.3799 N   0  0  0  0  0  0  0  0  0  0  0  0
   -7.4690   -0.4468    0.2422 N   0  0  0  0  0  0  0  0  0  0  0  0
   -7.8136   -0.1283    1.6244 C   0  0  0  0  0  0  0  0  0  0  0  0
   -6.7632    0.8908    2.0392 C   0  0  0  0  0  0  0  0  0  0  0  0
   -5.5246    0.3855    1.3227 C   0  0  0  0  0  0  0  0  0  0  0  0
   -6.0688   -0.0733   -0.0461 C   0  0  1  0  0  0  0  0  0  0  0  0
   -5.2554   -1.2432   -0.6177 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.8658   -0.8320   -0.9216 C   0  0  0  0  0  0  0  0  0  0  0  0
   -3.6647   -0.1417   -2.0770 O   0  0  0  0  0  0  0  0  0  0  0  0
   -2.3059    0.1587   -2.1237 N   0  0  0  0  0  0  0  0  0  0  0  0
   -1.8139   -0.3885   -1.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -2.7692   -1.0082   -0.2227 N   0  0  0  0  0  0  0  0  0  0  0  0
   -0.4078   -0.3427   -0.6136 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0488   -1.0902    0.4772 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.3984   -1.0569    0.8486 C   0  0  0  0  0  0  0  0  0  0  0  0
    2.3307   -0.2688    0.1543 C   0  0  0  0  0  0  0  0  0  0  0  0
    3.6731   -0.3282    0.5615 N   0  0  0  0  0  0  0  0  0  0  0  0
    4.8291   -0.0477   -0.0843 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.9334    0.1757    0.5968 N   0  0  0  0  0  0  0  0  0  0  0  0
    7.0123    0.4129   -0.2413 C   0  0  0  0  0  0  0  0  0  0  0  0
    6.7153    0.3213   -1.5854 C   0  0  0  0  0  0  0  0  0  0  0  0
    5.0623   -0.0682   -1.7942 S   0  0  0  0  0  0  0  0  0  0  0  0
    8.3378    0.7031    0.3040 C   0  0  0  0  0  0  0  0  0  0  0  0
    9.3324    1.3464   -0.4485 C   0  0  0  0  0  0  0  0  0  0  0  0
   10.5913    1.6060    0.1057 C   0  0  0  0  0  0  0  0  0  0  0  0
   10.8633    1.2259    1.4171 C   0  0  0  0  0  0  0  0  0  0  0  0
   12.5638    1.5736    2.1593 Br  0  0  0  0  0  0  0  0  0  0  0  0
    9.8883    0.5951    2.1844 C   0  0  0  0  0  0  0  0  0  0  0  0
    8.6313    0.3380    1.6284 C   0  0  0  0  0  0  0  0  0  0  0  0
    1.8659    0.4742   -0.9343 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.5160    0.4406   -1.3141 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0
  2  3  2  0
  2  4  1  0
  4  5  1  0
  5  6  1  0
  6  7  1  0
  7  8  1  0
  8  9  1  1
  9 10  1  0
 10 11  1  0
 11 12  1  0
 12 13  2  0
 13 14  1  0
 13 15  1  0
 15 16  2  0
 16 17  1  0
 17 18  2  0
 18 19  1  0
 19 20  1  0
 20 21  2  0
 21 22  1  0
 22 23  2  0
 23 24  1  0
 22 25  1  0
 25 26  2  0
 26 27  1  0
 27 28  2  0
 28 29  1  0
 28 30  1  0
 30 31  2  0
 18 32  1  0
 32 33  2  0
  8  4  1  0
 14 10  2  0
 33 15  1  0
 24 20  1  0
 31 25  1  0
M  END

A semi-interpretable 2D image of this 3D conformation, also generated by rdkit, is shown below. For comparsion, the "un-embedded" molecule, optimized to look nice on a 2D display, is also shown.

rdkit images

From the admittedly not-great 2D depiction of the embedded molecule, you can at least tell that the various aromatic rings are not coplanar. For better visualization of 3D conformations, you would want to use a tool like py3dmol.

RDKit is just one example of a software with these kinds of capabilities. I used it here because it's the one I know. OpenBabel is another one, as Martin rightly mentions in the comments.

Curt F.
  • 21,884
  • 2
  • 60
  • 115