ChemDraw Display
2/5/01

ChemDraw display is a new feature in ChemFinder 7 which permits the user to view desktop-quality images of their structures in ChemFinder, in forms though (not currently) in tables.  Moreover, all the graphical do-dads and flourishes of their original drawing are retained, even though these features are stripped, and the molecules normalized, when used by ChemFinder in searching (Figure 1).  Although we have had user demand for this feature, it presents serious danger, for the user does not see the version ChemFinder actually searches, and this could differ from expectation.

This feature is implemented by storing the original ChemDraw structure, as CDX, in the database alongside the interpreted ChemFinder version.  In the case of older databases lacking this field, or newly added structures that do not emanate from ChemDraw, the CDX is synthesized from the ChemFinder version.

To see ChemDraw-style display, go to the Box Properties dialog for the form box.  Under "Box Style", where there used to be only "Structure", there are now two choices, "ChemFinder style" and "ChemDraw style".

Hit highlight coloring should be evident in the ChemDraw style, but elements are not colored.

Testing Suggestions

Add an additional structure box to the forms you work with, and configure it with the ChemDraw-style display style.  Operate as you normally would on the database, using the ChemDraw-style box, while keeping an eye on the ChemFinder-style box.

Pitfalls

Various graphical and textual flourishes in ChemDraw do not have any meaning to ChemFinder.  Viewing the ChemDraw version may mislead the user into thinking ChemFinder has "understood" these features.  Examples are:

A minor shortcoming of the ChemDraw style is that reaction layouts are not recomputed as the box is resized when the CDX structure is actually stored, as opposed to synthesized on the fly.  (There is no easy way to tell if the image is stored or synthesized.)

Figure 1.  A structure imported from ChemDraw displayed in ChemFinder style (left) and the original ChemDraw style (right).

 

Technical

The remaining notes are a more technical description of the feature.

Why CDX?

CDX is a natural choice to store the original pictorial information because most of the pictures are received in that format, it is a rich graphical language for chemistry, and it is already our main interchange format.

Architecture

Hitherto, the database "Structure" field has been a mock-up.  Along with the molecular weight and molecular formula fields, it is conjured up by ChemFinder without any correspondence to a real database field.  (The Mol_ID field does have an actual database correspondent; the MF and MW fields do not.)  This Structure field drew data not from the RDB, but from a MST server.  The simplest way to store CDX structural representations is to introduce an actual "Structure" field in the RDB, and stick them there.  Now when a database is opened, the Structure field will be bound to this real field if it exists.  Data exchange of structures becomes two-pronged:  as before, the ccMolecule portion is transmitted to/from the MST driver, and  now additionally the CDX part will be transmitted to/from the RDB.

The MST becomes derivable from the CDX Structure

Given that we will store CDX in the relational part of the database ("RDB"), and that the ChemFinder version (a ccMolecule) can be derived from the CDX, the role of the MST file becomes that of a peripheral index.   It can be regenerated from the RDB at any time.  This constitutes a major strategic gain.  On the other hand, in situations where several RDB's share a common MST file, as is the case with ChemACX, the new architecture unnecessarily (redundantly) uses large amounts of memory.  We should omit the CDX field from such databases.

CdxMol

Whereas before a simple ccMolecule was passed around, now a CdxMol object is used in those places where the dual pictorial / informational meaning of a molecule is required.   These places are mainly the form box (CFormBox), mol control (CMolCtl), and mol field (MolField).  A CdxMol is an encapsulation of both a ccMolecule and a CDXDocument, with methods to transparently interconvert between them, so as to serve up whichever version is required.  In other words, a CdxMol can be seeded with either a ccMolecule or CDXDocument, and it is able to synthesize one from the other on demand.   The new object carries practically no overhead.   There are no new ccMolecule copy constructors executed, nor are there any unnecessary format interconversions.  In order to support hit highlighting of the CDX picture, the CdxMol class also contains map information relating the individual atoms of the ccMolecule and CDXDocument.  This map is tucked inside the CDX as a user attribute, rather than alongside it in the CDX field, so that a third party (such as Louis using VB) can work with the stored CDX normally.

When an (old) DB is opened that lacks a Structure field, one could be created if the database is writable.  However, this seems extreme and unwarranted.  In the field's absence, any newly entered CDX will not be stored.  If the Structure field were to be created in the middle of a database's lifetime, then it will receive data whenever a record is committed, but older records will have no CDX; such will be synthesized on the fly.   As it stands, the CDX field is only created when a database is created.  

Whenever a record is committed, if there is no CDX in the CDX Mol, nothing is stored in this field.  This saves database space.  The ccMolecule copy is always stored.  Fresh new CDX, that is, one that is graphically richer than the ccMolecule representation,  finds its way into the DB in the following ways:

Display

There is still only one type of mol control, but it has a new style flag indicating whether it should display the CDX (using a ChemDraw metafile), or the traditional native rendition.  Which style is used is selectable on the Box Properties dialog.    The native display shows the structure as ChemFinder internally uses it, which differs from the source structure in the following two ways:

ChemDraw style could be the normal, default  state.  Native display would then be consulted only to determine how the structure has been normalized, or by the developers to see what is occurring beneath the floorboards.  However, due to the pitfalls discussed, it is essential to provide the users access to the ChemFinder version..  (And in any case, we cannot do away with the native display, because it is essential to development.)

Hit Highlighting and Element Coloring

Hitherto, native-display coloring has consisted of coloring atoms and bonds in the hit region after a search, coloring query atoms and bonds, and coloring remaining atoms according to the periodic table.  Optionally, atoms and bonds in reaction centers and/or atom-to-atom maps may also be colored.  Hit highlighting is triggered by setting a bit in the atoms and bonds of the ccMolecule housed in the CMolCtl.   This is harmless, and moreover, this bit is stripped away during registration.   The other effects are applied at the time of drawing, in part by consulting a periodic table.

Structure depiction is either for internal (browse mode) or external (Copy-to-clipboard; Edit-in-ChemDraw) consumption.  Hit highlighting is suppressed for the latter group, but otherwise they are colored identically.  Significantly, the latter group is the source of structures that come back into ChemFinder.

It is desirable to provide the same coloring for CDX pictures as for native display, however, CDX pictures are a little different.  Unlike with ccMolecules, we aim to permit the user to style their CDX picture in any graphical way, and retain that image, and this should extend to coloring.  For example, the user might want to color part of a molecule for emphasis.  How then do we distinguish between the colors we have imparted, and those the user has assigned?   I don't have an answer.  For the time being, we'll strip the colors from incoming CDX atoms and bonds.

File I/O

It was tempting to adapt CFilter_Mol and all its subclasses to deal in CdxMol's rather than ccMolecule's.  However, the CDX file type (CFilter_CDX) would have been the only type improved, and the amount of code touched would have been great.  As a consequence, CFilter_CDX is no longer used.  Instead, global functions are used to read CDX disk files and memory blocks directly.

Storage of Maps inside the CDX

When a CDXDocument is converted to a ccMolecule, or vice versa, maps are simultaneously generated that track the correspondences between atoms and bonds in the two.  The maps are used to highlight and otherwise annotate the CDX display.  The "active" maps used inside the program map ccMolecule types, ATOMNO and BONDNO,   to CDX types, CDXNode*'s and CDXBond*'s:

    typedef map<ATOMNO,CDXNode*>     atomMap;
    typedef map<BONDNO,CDXBond*>     bondMap;

(Actually, CDXGraphicObject* is used in place of CDXNode*, because ATOMNO's can map to isolated text labels.)  However, because the pointers are transient values, a more permanent type of marker must be substituted when inserting the maps into the CDX for storage.  The CDXObjectID's of the atoms and bonds would be ideal, except that they are not guaranteed unique within a CDX document.  Instead, a list of the ObjectID's encountered during the hierarchical trip down the CDX tree to the object of interest is used.  If it happens that all the ID's in the CDX are unique, then a shortcut is taken, and the scalar ID's are stored instead of lists.

Upgrading Databases (speculative)

There are two types of database upgrades possible: the MST index, and the fields inside the RDB.  Hitherto, we have only needed the former.  That the RDB was not involved simplified the problem.  The RDB stands by unaffected as the records in the MST upgrade themselves.  We should maintain this facility, but in a future (CFW 8?) release the normal type of upgrade should unfold as follows:

  1. Create a "Structure" field in the RDB, if missing..
  2. Create empty MST/MSI files under a temporary name.
  3. Loop through the RDB.
    a.  If the CDX structure is lacking, one is created from the old MST entry.
    b.  A native structure, expressed in the latest (or other) MST format,  is generated from the CDX and stored in the new MST/MSI.   Optionally, new sequential Mol_ID's can be assigned at this point.  The RDB receives the new Mol_ID.
  4. Rename away the old MST/MSI files, as a functional equivalence to deleting them.
  5. Rename the new MST/MSI files to the normal name.

If Mol_ID's are resequenced and other RDB's share the structures, i.e. also rely on the Mol_ID's, undesired consequences will ensue.

This upgrade procedure will take much longer than the present one, which does not reference the RDB.  Once a DB is ensured of having CDX data in every record, then we might revert to the old type of upgrade.  On the other hand, normalization is more accurately performed from the original CDX than the current native structure, so it is sometimes better to upgrade from the RDB.  We can play this by ear.  However, it would be strategically useful to have a mechanism to tell whether a given RDB has a full complement of CDX structures.  One way to achive this would be to require generation of the Structure field at the next upgrade.  Except for special purpose files, then, all databases of a certain MST version or later can be assumed to have CDX, and therefore give us license to discard their MST/MSI files on a whim.   (Okay, a big whim.)   Designers of irregular databases must beware!