
Currently there are 100-200 databases, each covering some aspect of microbial life: Nomenclature, phylogeny and taxonomy, biochemistry, physiology, phenotypic information, ecology, genetics, alignment and sequence data (including the important recent availability of completely sequenced genomes). This growing mass of information is geographically scattered and organized in almost as many ways as there are databases.
CME's ultimate goal is a database accessible on the World Wide Web, which provides quick answers to broad questions that involve all known aspects of microbial life. Such a database should offer a phylogenetic interface and be able to accommodate all types of data, from images to a complete genomic DNA sequence. It will include the 16S rRNA phylogeny supplied by the Ribosomal Database Project (RDP), metabolic reconstructions, phenotypic data from Bergey's Manual, and many others.
As a first step toward this goal, CME hosted an international workshop in August 1995. Ways to initiate the integration and different structural approaches were discussed. Among others present were representatives from the German Collection of Microorganisms and Cell Cultures, Technical University of Munich, University of Ghent in Belgium, RIKEN Institute of Japan, and the World Data Center for Microorganisms. It was concluded that the two most serious obstacles to integration of existing databases are 1) lack of a freely available, up-to-date list of procaryotic organism names with synonyms, and 2) that much of the needed data are incomplete, proprietary or organized in ways unsuitable for integration. It generally was agreed that an integrated database should not impose restrictions, and only require that submitted data must be accurate, well described and consistently formatted.
During December 1995, Oliver Strunk (Munich) and Niels Larsen (CME) developed a Java-based Web interface that can display phylogenies and taxonomies, fetch objects elsewhere on the Web, and interact with an underlying query mechanism. The figure on page one is from this interface, and shows a phylogenetic tree with highlighted organism annotations. This interface can be used for organism selections (input) and display of their characteristics (output). With an improved query system, this framework could allow microbiologists to explore characteristics of genetically close relatives, finding ways in which one group is similar to or different from another. During 1996 the development will continue toward a working version of this interface that includes all available data. CME will provide the Web site.
A number of important applications of an integrated microbial database already can be predicted. For example, researchers will not have to go through an ocean of literature to find characteristics of genetic relatives to a certain organism of interest. New questions can be routinely asked, based on the easy availability of the phylogenetic distribution of traits.
From sequenced genomes will emerge complete metabolic maps, with connections to participating enzymes and their aligned sequences (now being developed by Evgeni Selkov and Ross Overbeek at Argonne Laboratories). "One could imagine new automated services where, for example, submission of a 16S rRNA fragment derived from an environmental sample would return a conservative list of characteristics of the unknown organism that the rRNA came from," says Larsen, "or probes that support environmental sampling of genes that encode enzymes with desired properties."
"Some of these complex queries have an impact that cannot even be imagined until we see the results," said Dr. Jim Tiedje, Director of CME. Tiedje compares the microbial database integration to the human or plant genome projects because it will draw international attention for both funding and a better understanding of microbiology needs. "Removing the obstacles will require the interest and experience of many international microbiologists," said Tiedje. "Coordinating and providing the initiative for the project has been with the Center, but it is a project that belongs to the international community."