Report of the Workshop on Integration of Microbial Databases




CME is interested in any comments you may have about the contents of this report.
Please contact niels@vitro.cme.msu.edu with any questions or comments you may have.

Table of Contents

1.0 Introduction

2.0 Goals

3.0 IMD Prototypes Demonstration

4.0 Recommended Activities

4.1 Organization and Administration

4.2 System Design and Implementation

4.3 Data to be Integrated

4.3.1 First Priority Data a. Nomenclatural Database
b. Phylogenetic Trees
c. Phenotypic Data

4.3.2 Existing Databases

a. Publicly Accessible On-Line Databases
b. Independently Curated, Specific Databases

4.3.3 Databases Needing Development

a.) ARDRA
ARDRA(Amplified Ribosomal DNA Restriction Analysis) is a fast and easy screening procedure frequently used in environmental studies for comparing the similarity of small subunit ribosomal RNA genes. It would be useful for the researcher to have a tool for comparing restriction patterns from new isolates to those of sequences existing in the databases. To that end it is proposed that the IMD develop a database of restriction patterns of rRNAs residing in the RDP databse.

b.) Habitat
Workshop participants identified the need for habitat information to be part of any IMD. The situation at present is that no specific microbial habitat database exits. Habitat data is presently collected and recorded erratically (and usually incompletely). Sites where one would expect to find rather complete habitat descriptions, for example, International Journal of Systematic Bacteriology, often do not require more than cursory descriptions.

Habitat description is deemed essential to refine the collection location for future collections, draw inferences about the physiological/phenotypic characteristics of an organism, do comparative ecological studies between sites, and locate sites in which to look for similar organisms. It is also important to attempt to harmonize the types of data with those collected by macroecologists and systematisists.

It was recommended that data recorded by individual investigators that is considered as absolutely essential include:

  1. Use of the Latitude/Longitude/Elevation (Global Positioning System [GPS] should become standard equipment for the microbiologist collecting samples from nature.) (Vertical location relative to local aqueous surface/ground surface should also be recorded.)
  2. Time/Date
  3. Stratographic Description
  4. Temperature
  5. Textural Comments
  6. Habitat Type a. Terrestrial
    b. Aquatic
    c. Laboratory
    d. Aerial
    e. Biological Host-Location

Within these, subtypes are possible: for example, terrestrial.

-- Northern Deciduous Forest
-- Savanna
The second level of information would be physical characteristics associatedwith a habitat type, for example: -- pH
-- Redox
-- Chemistry
-- Moisture Content
-- Organic Matter Quantity and Quality
-- Mineralogical Composition
-- Salinity
The third level would be biological characteristics associated with ahabitat type. -- Plant/Animal Community
-- Other Microbes
The above data, if collected, would be attached to the data associated with the major focus of the study (for example, taxonomic ID of new species) and would be carried into the IMD with the primary data (for example, species name). However, we recognize that many (if not most) microbiologists may not collect more than the minimum habitat dataset of latitude/longitude/vertical location, etc. Therefore the IMD should locate databases containing habitat information and link these to the IMD.

c.) Databases Obtained with Commercial Test Kits or Systems
A number of such systems exist and are widely used by clinical microbiologists, ecologists, culture collections, and those working with the isolation and characterization of bacteria from environmental samples. Examples of specific commercial products include Biolog and API strips. Data obtained using these commercial products can be found in three general categories of databases: those kept by the commercial firm that produces the test kit/system; those amassed by culture collections; and those obtained by individual users of the products. The databases kept by the commercial firm are likely to be large (amassed using a large group of organisms) and collected under a standard set of operating conditions. The availability of such databases for a IMD is uncertain at this time.

The databases amassed by culture collections (for example, ATCC) are likely to contain information on as many or more organisms than the commercial firms and also to be of high quality having been run at a well documented standard set of experimental conditions. The availability and form of this data is unknown but presumed to be more available than that from commercial firms.

The databases amassed by individuals is likely to be small for any one researcher (user) but perhaps huge for the collective research enterprise. However, operating conditions are also likely to be highly variable from person to person as they modified run conditions to suit their particular systems/organisms. The distributed nature of this data and its variable quality may make it impossible to collect or validate for use in the IMD.

The workshop participants agreed that various types of phenotypic data would be an extremely valuable part of any IMD. Given the relative paucity of phenotypic databases and the potential of these commercial metabolic test kits/systems to provide such data, the steering committee of the IMD or their designee should pursue the availability of data obtained from using such systems first from culture collections participating in the IMD and then from the commercial firms that supply the units/systems.

d.) Images
It is proposed that, ultimately, microbial images be included in the IMD. Among the reasons for this proposal is the fact that an image of the microbe itself (i.e. its morphology), either as an individual cell or a multicellular arrangement, is one of the first attributes of a microbe to be recorded and quantified. In some cases, morphology alone is so distinctive as to afford an identification to the genus or group level (e.g. Caulobacter, Gallionella, spirochetes).

Initial efforts should be to compile in the database light and electron micrographs of cells and their distinctive morphological features, for example: (i) appendages; sheaths, intracellular inclusions and distinctive membrane arrangements, etc.; (ii) resting/dormant stages (spores/cysts) and other morphogenetic forms (swarmer cells); and (iii) distinctive multicellular assemblages (colonies, swarms, fruiting bodies). In addition, since many microbes induce characteristic lesions or other morphological changes in association with plant and animal hosts, images of such processes should also be included (e.g. pustules, tubercles, galls, nodules). Images derived from more sophisticated, spectroscopic analyses (e.g. FTIR spectra of cell envelopes) should also be included, although it is recognized that some potential users may not have such data nor the means to acquire it readily.

A longer term effort should be directed to establishing within the database the capacity for image analysis, with the goal of identifying microbes by computerized comparisons to images held in the database.

4.3.4 Other Groups Which Have Microbial Strain Data

5.0 Federation Membership and Responsibilities

6.0 Workshop Participants

7.0 Summary


More information about the integrated database project is available in insights, the CME Newsletter.


Return to CME Publications.