Report of the Workshop on Integration of Microbial Databases
The first system utilized a relational database approach based upon a modified version of Sybase. Databases included in the prototype were a subtree of the Ribosomal Database Project phylogenetic tree, a subset of fatty acid (FAME) data from Microbial ID, Inc., Newark, DE (MIDI), RKC encoded, phenotypic data and a microbial taxonomy, both provided by Bergey's Manual Trust. To ask queries within a phylogenetic framework the phylogenetic tree served as the interface for access to the other data. Queries about a particular organism or subtree were asked using a "Query Tree" menu with the results displayed on the tree. Queries incorporated into this prototype were: 1.) show on the tree the phylogenetic distribution of a specified trait; 2.) list all known traits of a specified organism; and 3.) list all common traits of a specified subtree. The advantage of the relational database approach is that it is a mature technology. Thedisadvantages include the need for a Sybase license, no inherent websupport, and recreation of 100-200 schemas for existing databases would be required.
The second system was a proposal to use SRS 5 (an improved version of the EMBL supported Sequence Retrieval System [SRS] by Thure Etzold. This World Wide Web based, database network already contains more than 100 molecular biologydatabases and can, without modification, accomodate most data relevant tomicrobiology. The advantages of this approach include 1) a large set (over120) of molecular biology databases already connected, 2) a fast query enginethat can follow links, 3) a flexible DDL (data definition language), 4) it isfreely available with source code, and 5) a responsive development team. Disadvantages include a weak user interface (no phylogenetic interface) and nosupport for non-textual data (The development team has recently shown a willingness to support taxonomies and intregrate phylogenetic/taxonomic interfaces currently being developed by Oliver Strunk and Niels Larsen.).
The third system was a WWW-based prototype that organized the same data used in the relational database prototype within a phylogenetic framework using a subtree of the RDP phylogenetic tree. Written in Perl 5 language it implemented a mechanism for navigating through the tree, contained a method for calculating rRNA signature, and could link the data to the outside world. At the time of the workshop a general query mechanism was not yet complete.
4.0 Recommended Activities
4.2 System Design and Implementation
4.3.3 Databases Needing Development
5.0 Federation Membership and Responsibilities
More information about the integrated database project is available in insights, the CME Newsletter.