5.1     SD/RDFile Features


19 May 99

This page describes changes in SDFile and RDFile I/O developed in 5.1.


Derwent Support

The Derwent World Drug Alert (WDA), and perhaps other Derwent SD files, have an unusual format.  Most structures contain two molecules rather than one:  a generic structure and an example structure.  The generic structure is characterized by atoms labelled "R" and variable attachment atoms (which we don't presently interpret).  The example structure lacks generic features.   Additionally, there are counterions and textual annotations referring to one or the other principal structures.  Some records contain only one structure or the other.   Our support of Derwent SD files is three-fold:

  1. Recognize which is the generic, which the example structure, and assign the other textual and small moelcules to the first two.
  2. S-Groups (see below) are expanded for the MST file, and contracted in CDX.
  3. Store the structures in several places:
    1. The example structure goes in the MST, where it can be searched.
    2. The generic goes in the relational part of the database in a CDX-formatted field called "CDX_generic".
    3. The original structure, i.e. both example and generic, is written to another CDX-formatted field called "CDX_pair".

To Use:

  1. If this is the first time, edit your .INI file and add the following to the "OPTIONS" section:
                        
    EXPORT_PARALLEL_CDX=1
    This will cause the appearance of a new checkbox on the "Position.." dialog, which is reached from the SD file import dialog.
  2. Close any open forms, and choose < File + Import SD file>.   Provide a file such as: Derwent_1.sdf
  3. For simplicity, select the radio button to overwrite any preexisting database.  Using the File Manager, delete any preexisting form file.   (Otherwise, the form generator won't be put through its paces.)
  4. Click the <Position...> button on the SD import dialog.   Ensure the CDX box is checked.
  5. Click <Go> to import the sample file.
  6. Examine the resulting form.  Because CDX fields cannot be displayed in CFW 5.1, the two CDX fields should not have been created.  However, if you look at the database fields (i.e., in Box Properties), they should be there.   There is no way to test the integrity of the two CDX fields at this time.

Status:  ready to test in 5.1d9.  Louis has solved the earlier problem of the mysterious "12-byte chunk."  For testing and verification instructions, see him.


Import S-Groups

Molfiles (and consequently SD files) may contain S-groups, which, in their commonest form, are equivalent to ChemDraw's nicknames, and ChemFinder's superatoms.  In a molfile the S-groups are provided in expanded form, together with data about how to contract them.  For the Derwent application only, we contract the superatoms that will appear in CDX, but do not contract those in structures headed for the Mst file, since we can't search (or even store!) them there.  This is not testable in CFW 5.1, because we are unable to view the CDX created in Derwent files.

Status:  testable via Derwent import.  See Irwin or Louis for information about how to build a database from Derwent files.  This feature will not be accessible to the average user, so broad testing is not required.


Auto-Integer ID Fields

When ChemFinder reads an RDFile containing a hierarchical data structure, it creates tables with common columns, converting the hierarchy to a relational scheme.  Details are in the RDFile description written for CFW 5.0.

Previously, ID columns in these tables were configured as standard numeric fields.  This caused problems when a load was interrupted and restarted, as the values became renumbered from 1 after the restart.  To solve this problem in 5.1, we configure these columns as "auto-integer" fields, so that they are automatically numbered as records are added.

Status:  ready to test in 5.1d8.  Choose an RDFile with a lot of complex data.  Load and verify the created tables.   Try loading, breaking in the middle, then restarting; ensure the generated tables have sequential ID's without duplicates.