PDB/MSD: the making of an object structured, fully integrated structure database

Jaime Prilusky and Enrique Abola

A relational database that meet the needs of the various user communities of the Protein Data Bank (PDB) was built using Victor Markowitz's Object Protocol Model (OPM). The database was implemented on a SYBASE engine. In addition to all coordinate entries found in the PDB, the PDB/MSD database includes semantic links to entries found in other biological databases. The first steps in the formation of a federation of biological database is also made possible by this work. All PDB/MSD database bibliographic citations are stored and maintained on GDB's citation database (CitDB) that was also built using OPM.

The Protein Data Bank is an archive of experimentally determined three-dimensional structures of proteins, nucleic acids, and other biological macromolecules. The Data Bank contains atomic coordinates, bibliographic citations, primary sequence and secondary structure information, as well as crystallographic structure factors and 2D-NMR experimental data. The Data Bank includes data on naturally occuring and engineered macromolecules. The common interest shared by the community of PDB users is the need to access information that can relate the biological functions of these macromolecules to their 3-dimensional structures.

The recent explosive growth of structural information of biological macromolecules has been accompanied by increased demands for such information. The challenge facing the PDB is to keep abreast of the the increasing flow of data, to maintain the archive as error-free as possible, and to organize and present this information in ways that will facilitate data retrieval, knowledge exploration, and hypothesis testing, without interrupting current services. The new database is expected to meet these challenges and is also expected to contribute significantly to an improvement in the quality of data available from the PDB.

Primary access to the database is via WWW clients such as Mosaic and Netscape. A powerful html-based browser has been constructed that allows casual users easy access to the SYBASE tables. Direct access to the tables via SQL is also available. The database is currently available in beta-test form and will be released as a regular product in late October, 1995.

Whitehead Institute/MIT Center for Genome Research