Visualization of ribosomal RNA operon copy number distribution
© Rastogi et al; licensee BioMed Central Ltd. 2009
Received: 23 April 2009
Accepted: 25 September 2009
Published: 25 September 2009
Results of microbial ecology studies using 16S rRNA sequence information can be deceiving due to differences in rRNA operon copy number and genome size of the detected organisms. It therefore will be useful for investigators to have a better understanding of how these two parameters differ in various organism types. In this study, the number of ribosomal operons and genome size were separately mapped onto a Bacterial phylogenetic tree.
A representative Bacterial tree was constructed using 31 marker genes found in 578 bacterial genome sequences. Organism names are displayed on the trees using graduations of color such that similar colors indicate similar numbers of operons or genome size. The resulting images provide an intuitive understanding of how copy number and genome size vary in different Bacterial phyla.
Once the phylogenetic position of a novel organism is known the number of rRNA operons, and to a lesser extent the genome size, can be estimated by examination of the colored maps. Further detail can then be obtained for members of relevant taxa from the rrnDB database.
The ribosomal RNA (rRNA) genes of Bacteria and Archaea are typically found in operons. Although many organisms have a single rRNA operon the actual number is known to vary between 1 and 15 . The operons themselves do not always exhibit the same sequence but instead different in a modest number of positions, typically less than 15 in the case of 16S rRNAs. Nevertheless, there are exceptions. For example, one of the three 16S rRNA genes in Halobacterium marismortui differs from the others in over 70 positions . Such microheterogeneity has been studied in detail in a modest number of cases. For example, it has been recently shown is in Streptomyces coelicolor that all the operons are expressed and their RNAs incorporated into ribosomes but the relative expression level may vary over the growth cycle [3, 4]. In the case of H. marismortui, the aberrant operon responds to temperature differently . Efforts to evaluate the extent of rRNA operon microheterogeneity likely should be handled cautiously. An examination of complete genome sequences revealed many examples where all the 16S rRNA genes in an organism with multiple rRNA operons are reported to be identical . There certainly are cases where multiple rRNAs exist with the same sequence. However, in the case of the rapidly accumulating bacterial genomes, one must remember that long nearly exactly repeated regions are difficult to sequence. Thus, one must consider the possibility that at least some and perhaps many, of the assembled genomes are reporting multiple copies of what are actually consensus rRNA sequences.
Although the true extent of microheterogeneity may be underestimated in the published genomes, the numbers of operons present is likely reliable. Since 2001 the number of ribosomal operons has been curated in the rrnDB (Ribosomal RNA Operon Copy Number Database) [7, 8] for all instances where it is known. The number of rRNA operons is believed to in part be correlated with organism ecological strategy [9–11]. Operon number is of special interest when 16S rRNA sequence information is used to study the composition of microbial ecosystems because organisms with larger numbers of copies of the rRNA operon will be disproportionately represented in the resulting profiles . Therefore, when attempting to quantify relative numbers in environmental populations, it is appropriate to correct the data by taking into account both the genome size and the number of operons . However, this is potentially problematic as many of the strains that are encountered have no exact match in the database and it is therefore not immediately apparent how many operons are likely to be present or what the genome size is likely to be. Herein, we examine this issue by mapping these two traits onto a phylogenetic tree . Once one determines the approximate phylogenetic position of an organism one can use these maps to make a reasonable assessment of genome size and especially, rRNA operon copy number.
Homologs of each of the 31 phylogenetic marker genes(dnaG, frr, infC, nusA, pgk, pyrG, rplA, rplB, rplC, rplD, rplE, rplF, rplK, rplL, rplM, rplN, rplP, rplS, rplT, rpmA, rpoB, rpsB, rpsC, rpsE, rpsI, rpsJ, rpsK, rpsM, rpsS, smpB, tsf) were identified from the 578 bacterial genomes that were complete at the time of the study. The corresponding protein sequences were retrieved, aligned, and trimmed and then concatenated by species into a mega-alignment . A maximum likelihood tree was then constructed from the mega-alignment using PHYML. The model selected based on the likelihood ratio test was the Whelan and Goldman (WAG) model of amino acid substitution with gamma-distributed rate variation (5 categories) and a proportion of invariable sites. The shape of the gamma-distribution and the proportion of the invariable sites were estimated by the program
The number of ribosomal operons in each genome and the size of the genome were obtained from the NCBI website http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi. In a small number of instances bacteria are considered to have multiple chromosomes. In these cases, the total number of operons in all the chromosomes was used and the combined mass of the multiple chromosomes used for genome size. In addition, in some instances the number of copies of each rRNA is different. This is most frequent for 5S rRNA, which may be present in an extra copy. In these cases, the number of 16S rRNA genes was used as the number of operons as in most practical applications it is 16S rRNA that is being examined. The tree was combined with the operon and information and built using Newick format such that each node is specified http://en.wikipedia.org/wiki/Newick by "species-name*genome-size*rRNA-operon-count". The organism names on the tree were colored according to either operon number or genome size. In each case, as the parameter increases the color generally becomes darker. Thus, for the operons 14 colors were used. For 0 to 6 operons, shades of yellow, orange or red were used with darker colors indicating larger numbers of operons. For 7 to 10 operons shades of blue were used and greens were used for 11 or more. In the case of genome size, 12 colors were used to depict various size ranges. The first range was 0-1 MB with subsequent increments of 0.5 MB. The final range was for genomes greater than 6 MB in size. The final tree was created in the .esp format using ATV .
The fact that members of the same species generally have essentially the same number of rRNA operons has been pointed out previously . However, in the absence of the type of mapping shown here the phylogenetic extent to which this is true is not readily recognized. Initial mapping efforts  were not fully informative in this regard due to the modest number of species for which the requisite information was available at the time. Prior work has shown that rRNA copy number impacts organism life history [7, 10]. This suggests that gain or loss of rRNA operons would appear to be a potential method of adapting to different environments and one might envision numerous individual organisms in populations as having different numbers of rRNA operon. Although rRNA operon copy number has typically not been examined in multiple individuals within a population, the high conservation of numbers within similar species from different sources argues against this.
The maps provided here will be especially useful to those seeking to quantitatively characterize microbial ecosystems using 16S rRNA sequence characterizations. The number of times an organism is encountered must be adjusted for the size of its genome and especially the number of copies of the 16S rRNA gene it carries. Once 16S rRNA sequence data is available the approximate phylogenetic position of each organism can be estimated. The mappings can then be examined to obtain initial estimates of rRNA operon number and genome size by examining the neighboring phylogenetic groupings. With the relevant phylogenetic groupings identified one can then use the rrnDB database  to obtain the values for all organisms belonging to those groups.
This research was supported in part by grants to GEF from the Robert A. Welch Foundation (E-1451), the Texas Advanced Research Program, the NASA Exobiology program (NNG05GN75G), and the Institute of Space Systems Operations
- Rainey FA, Ward-Rainey NL, Janssen PH, Hippe H: Clostridium paradoxum DSM 7308(T) contains multiple 16S rRNA genes with heterogeneous intervening sequences. Microbiology. 1996, 142: 2087-2095. 10.1099/13500872-142-8-2087.PubMedView ArticleGoogle Scholar
- Mylvaganam S, Dennis PP: Sequence heterogeneity between the twogenes encoding 16S rRNA from the halophilic archaeabacterium Haloarcula marismortui. Genetics. 1992, 130: 399-410.PubMed CentralPubMedGoogle Scholar
- Kim HL, Shin E, Kim HM, Go H, Roh J, Bae J, Lee K: Heterogeneous rRNA molecules encoded by Streptomyces coelicolor M145 genome are all expressed and assembled into ribosomes. J Microbiol Biotechnol. 2007, 17: 1708-1711.PubMedGoogle Scholar
- Kim HL, Shin EK, Kim HM, Ryou SM, Kim S, Cha CJ, Bae J, Lee K: Heterogeneous rRNAs are differentially expressed during the morphological development of Streptomyces coelicolor. FEMS Microbiol Lett. 2007, 275: 146-152. 10.1111/j.1574-6968.2007.00872.x.PubMedView ArticleGoogle Scholar
- López-López A, Benlloch S, Bonfá M, Rodríguez-Valera F, Mira A: Intragenomic 16S rRNA divergence in Haloarcula marismortui is an adaptation to different temperatures. J Mol Evol. 2007, 65: 687-696. 10.1007/s00239-007-9047-3.PubMedView ArticleGoogle Scholar
- Acinas SG, Marcelino LA, Klepac-Ceraj V, Polz MF: Divergence and redundancy of 16S rRNA sequences in genomes with multiple rrn operons. J Bacteriol. 2004, 186: 2629-2635. 10.1128/JB.186.9.2629-2635.2004.PubMed CentralPubMedView ArticleGoogle Scholar
- Klappenbach JA, Saxman PR, Cole JR, Schmidt TM: rrndb: the ribosomal RNA operon copy number database. Nucl Acids Res. 2000, 29: 181-184. 10.1093/nar/29.1.181.View ArticleGoogle Scholar
- Lee ZM, Bussema C, Schmidt TM: rrnDB: documenting the number of rRNA and tRNA genes in bacteria and archaea. Nucleic Acids Res. 2009, 37: D489-493. 10.1093/nar/gkn689.PubMed CentralPubMedView ArticleGoogle Scholar
- Dethlefsen L, Schmidt TM: The performance of the translational apparatus varies with the ecological strategies of bacteria. J Bacteriol. 2007, 189: 3237-3245. 10.1128/JB.01686-06.PubMed CentralPubMedView ArticleGoogle Scholar
- Stevenson BS, Schmidt TM: Life history implications of ribosomal RNA gene copy number in Escherichia coli. Appl Environ Microbiol. 2004, 70: 6670-6677. 10.1128/AEM.70.11.6670-6677.2004.PubMed CentralPubMedView ArticleGoogle Scholar
- Klappenbach J, Dunbar JM, Schmidt TM: rRNA gene copy number predicts ecological strategies in bacteria. Appl Environ Microbiol. 2000, 66: 1328-1333. 10.1128/AEM.66.4.1328-1333.2000.PubMed CentralPubMedView ArticleGoogle Scholar
- Tuova TP: Copy number of ribosomal operons in prokaryotes and its effect on phylogenetic analyses. Mikrobiologia. 2003, 72: 437-452.Google Scholar
- Einen J, Thorseth IH, Ovreås L: Enumeration of Archaea and Bacteria in seafloor basalt using real-time quantitative PCR and fluorescence microscopy. FEMS Microbiol Lett. 2008, 282: 182-187. 10.1111/j.1574-6968.2008.01119.x.PubMedView ArticleGoogle Scholar
- Siefert JL, Fox GE: Phylogenetic mapping of bacterial Morphology. Microbiology SGM. 1998, 144: 2803-2808. 10.1099/00221287-144-10-2803.View ArticleGoogle Scholar
- Wu M, Eisen JA: A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 2009, 9: R151-10.1186/gb-2008-9-10-r151.View ArticleGoogle Scholar
- Zmase CM, Eddy SR: ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics. 2001, 17: 383-384. 10.1093/bioinformatics/17.4.383.View ArticleGoogle Scholar