SYBYL line notation

From Wikipedia, the free encyclopedia
(Redirected from SYBYL Line Notation)
sybyl line notation
Filename extension
.sln
Type of formatchemical file format

The SYBYL line notation or SLN is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SLN differs from SMILES in several significant ways. SLN can specify molecules, molecular queries, and reactions in a single line notation whereas SMILES handles these through language extensions. SLN has support for relative stereochemistry, it can distinguish mixtures of enantiomers from pure molecules with pure but unresolved stereochemistry. In SMILES aromaticity is considered to be a property of both atoms and bonds whereas in SLN it is a property of bonds.

Description[edit]

Like SMILES, SLN is a linear language that describes molecules. This provides a lot of similarities with SMILES despite SLN's many differences from SMILES, and as a result, this description will heavily compare SLN to SMILES and its extensions.

Attributes[edit]

Attributes, bracketed strings with additional data like [key1=value1, key2...], is a core feature of SLN. Attributes can be applied to atoms and bonds. Attributes not defined officially are available to users for private extensions.

When searching for molecules, comparison operators such as fcharge>-0.125 can be used in place of the usual equal sign. A ! preceding a key/value group inverts the result of the comparison.

Entire molecules or reactions can too have attributes. The square brackets are changed to a pair of <> signs.

Atoms[edit]

Anything that starts with an uppercase letter identifies an atom in SLN. Hydrogens are not automatically added, but the single bonds with hydrogen can be abbreviated for organic compounds, resulting in CH4 instead of C(H)(H)(H)H for methane. The author argues that explicit hydrogens allow for more robust parsing.

Attributes defined for atoms include I= for isotope mass number, charge= for formal charge, fcharge for partial charge, s= for stereochemistry, and spin= for radicals (s, d, t respectively for singlet, doublet, triplet). A formal charge of charge=2 can be abbreviated as +2, and vice versa for negative charges; - and + is additionally recognized as −1 or +1 charges. * is a shorthand for spin=d. Stereochemistry on atoms is mostly tetrahedral, with the R/S and D/L available among others; it can be explicit (E) or relative (R), or specify a mixture (M) of stereoisomers at this atom. A normal/inverted (N/I) notation, equivalent to @@ and @ in SMILES, is provided. A lot of additional attributes are provided for searching.

In addition to elemental atoms SLN supports the specification of wild card atoms: Any (match any atom), and Hev (match any heavy atom). It also has an extensive Markush syntax for specifying combinatorial libraries and RGROUP queries. SLN has several query atom types for matching groups of atoms. Each type has the group name, followed by an optional positive integer.

Group Description
R Used to match a side chain. Matched atoms must not have any connection to the core
X Used to match side chains and rings. Atoms matching an X group can match side chains and rings
Rx Matches side chains and rings, a ring closure must match a second Rx group

The "0" mass number denotes the usual isotope, so N[I=0] equals N[I=14] matching 14N and N[!I=0] matching every other isotope.

Bonds[edit]

SLN uses largely the same bonding notation as SMILES, with -, =, #, and : for single, double, triple, and aromatic bonds. . is used for zero-order bonds, similarly to reaction SMILES, although a + is preferred for distinct molecules.

Most single bonds are implicit, so CH3CH3(CH3CH3) can be used instead of CH3-CH3(CH3–CH3) for ethane. Explicit single bonds are useful for three-center bonds.

The s= attribute is defined for double bonds, to convey stereochemistry information in EZ (E/Z) or cistrans (c/t) notation. N/I is available and stands for the "main" chain, which is trans or cis to each other.

Rings[edit]

SLN writes rings in a more explicit pattern than SMILES, with benzene specified as C[1]H:CH:CH:CH:CH:CH:@1. An atom is tagged as an anchor on the ring with a single numeric attribute, and @1 can then be used to specify this (in our case, "number one") atom for bonding back to.

Branching[edit]

SLN branches are identical to SMILES branches, with parentheses specifying them. Propionic acid is CH3CH2C(=O)OH().

Reactions[edit]

SLN supports reactions with -> connecting the reactants and the products. Atom mapping is possible with the use of [#num] attributes. The reaction center (rc) attribute can be added to bonds, and the chiral conversion (cc) attribute to atoms.

Misc.[edit]

Multiple lines can be merged into a syntactical line by writing a \ (backslash) at the end of each line. This allows for breaking a long line into multiple lines, for example in a reaction with each molecule on its own line.

See also[edit]

References[edit]

  • Ash, Sheila; Cline, Malcolm A.; Homer, R. Webster; Hurst, Tad; Smith, Gregory B. (1997). "SYBYL Line Notation (SLN): A Versatile Language for Chemical Structure Representation". J. Chem. Inf. Comput. Sci. 37: 71–79. doi:10.1021/ci960109j.
  • Homer, R. Webster; Swanson, Jon; Jilek, Robert J.; Hurst, Tad; Clark, Robert D. (2008). "SYBYL Line Notation (SLN): A Single Notation To Represent Chemical Structures, Queries, Reactions, and Virtual Libraries". J. Chem. Inf. Comput. Sci. 48 (12): 2294–2307. doi:10.1021/ci7004687.