Correspondence | Open | Published:
Visualization of ribosomal RNA operon copy number distribution
BMC Microbiologyvolume 9, Article number: 208 (2009)
Results of microbial ecology studies using 16S rRNA sequence information can be deceiving due to differences in rRNA operon copy number and genome size of the detected organisms. It therefore will be useful for investigators to have a better understanding of how these two parameters differ in various organism types. In this study, the number of ribosomal operons and genome size were separately mapped onto a Bacterial phylogenetic tree.
A representative Bacterial tree was constructed using 31 marker genes found in 578 bacterial genome sequences. Organism names are displayed on the trees using graduations of color such that similar colors indicate similar numbers of operons or genome size. The resulting images provide an intuitive understanding of how copy number and genome size vary in different Bacterial phyla.
Once the phylogenetic position of a novel organism is known the number of rRNA operons, and to a lesser extent the genome size, can be estimated by examination of the colored maps. Further detail can then be obtained for members of relevant taxa from the rrnDB database.
The ribosomal RNA (rRNA) genes of Bacteria and Archaea are typically found in operons. Although many organisms have a single rRNA operon the actual number is known to vary between 1 and 15 . The operons themselves do not always exhibit the same sequence but instead different in a modest number of positions, typically less than 15 in the case of 16S rRNAs. Nevertheless, there are exceptions. For example, one of the three 16S rRNA genes in Halobacterium marismortui differs from the others in over 70 positions . Such microheterogeneity has been studied in detail in a modest number of cases. For example, it has been recently shown is in Streptomyces coelicolor that all the operons are expressed and their RNAs incorporated into ribosomes but the relative expression level may vary over the growth cycle [3, 4]. In the case of H. marismortui, the aberrant operon responds to temperature differently . Efforts to evaluate the extent of rRNA operon microheterogeneity likely should be handled cautiously. An examination of complete genome sequences revealed many examples where all the 16S rRNA genes in an organism with multiple rRNA operons are reported to be identical . There certainly are cases where multiple rRNAs exist with the same sequence. However, in the case of the rapidly accumulating bacterial genomes, one must remember that long nearly exactly repeated regions are difficult to sequence. Thus, one must consider the possibility that at least some and perhaps many, of the assembled genomes are reporting multiple copies of what are actually consensus rRNA sequences.
Although the true extent of microheterogeneity may be underestimated in the published genomes, the numbers of operons present is likely reliable. Since 2001 the number of ribosomal operons has been curated in the rrnDB (Ribosomal RNA Operon Copy Number Database) [7, 8] for all instances where it is known. The number of rRNA operons is believed to in part be correlated with organism ecological strategy [9–11]. Operon number is of special interest when 16S rRNA sequence information is used to study the composition of microbial ecosystems because organisms with larger numbers of copies of the rRNA operon will be disproportionately represented in the resulting profiles . Therefore, when attempting to quantify relative numbers in environmental populations, it is appropriate to correct the data by taking into account both the genome size and the number of operons . However, this is potentially problematic as many of the strains that are encountered have no exact match in the database and it is therefore not immediately apparent how many operons are likely to be present or what the genome size is likely to be. Herein, we examine this issue by mapping these two traits onto a phylogenetic tree . Once one determines the approximate phylogenetic position of an organism one can use these maps to make a reasonable assessment of genome size and especially, rRNA operon copy number.
Homologs of each of the 31 phylogenetic marker genes(dnaG, frr, infC, nusA, pgk, pyrG, rplA, rplB, rplC, rplD, rplE, rplF, rplK, rplL, rplM, rplN, rplP, rplS, rplT, rpmA, rpoB, rpsB, rpsC, rpsE, rpsI, rpsJ, rpsK, rpsM, rpsS, smpB, tsf) were identified from the 578 bacterial genomes that were complete at the time of the study. The corresponding protein sequences were retrieved, aligned, and trimmed and then concatenated by species into a mega-alignment . A maximum likelihood tree was then constructed from the mega-alignment using PHYML. The model selected based on the likelihood ratio test was the Whelan and Goldman (WAG) model of amino acid substitution with gamma-distributed rate variation (5 categories) and a proportion of invariable sites. The shape of the gamma-distribution and the proportion of the invariable sites were estimated by the program
The number of ribosomal operons in each genome and the size of the genome were obtained from the NCBI website http://www.ncbi.nlm.nih.gov/genomes/lproks.cgi. In a small number of instances bacteria are considered to have multiple chromosomes. In these cases, the total number of operons in all the chromosomes was used and the combined mass of the multiple chromosomes used for genome size. In addition, in some instances the number of copies of each rRNA is different. This is most frequent for 5S rRNA, which may be present in an extra copy. In these cases, the number of 16S rRNA genes was used as the number of operons as in most practical applications it is 16S rRNA that is being examined. The tree was combined with the operon and information and built using Newick format such that each node is specified http://en.wikipedia.org/wiki/Newick by "species-name*genome-size*rRNA-operon-count". The organism names on the tree were colored according to either operon number or genome size. In each case, as the parameter increases the color generally becomes darker. Thus, for the operons 14 colors were used. For 0 to 6 operons, shades of yellow, orange or red were used with darker colors indicating larger numbers of operons. For 7 to 10 operons shades of blue were used and greens were used for 11 or more. In the case of genome size, 12 colors were used to depict various size ranges. The first range was 0-1 MB with subsequent increments of 0.5 MB. The final range was for genomes greater than 6 MB in size. The final tree was created in the .esp format using ATV .
Bacterial rRNA operon copy number was mapped onto a phylogenetic tree by coloring the organism names on each branch in accordance with the number of operons (Figure 1 and Additional file 1). Genome size was separately mapped in a similar manner (Figure 2 and Additional file 2). These maps allow one to readily visualize the extent to which these properties have been conserved over phylogenetic distance. In both cases, the values are conserved within species and frequently within genera as well. In the case of operon number, similar values are frequently found in neighboring groupings as well. Overall, rRNA operon number typically only exceeds six in two regions of the tree, the γ-Proteobacteria and the Firmicutes, e.g. Bacillus, Staphylococcus, Streptococcus, and others . Thus, if one knows the approximate phylogenetic position of an organism one can make a reasonable prediction of how many rRNA operons it will have. As previously noted, genome size and operon number are largely uncorrelated with the one exception that organisms with genome sizes below 1.5 MB almost never have more than one rRNA operon.
These observations are illustrated in Figure 3, which is excerpted from Figure 1 and shows a portion of the γ-Proteobacteria. Here one sees that for a large number of enterics (Escherichia, Salmonella, Yersinia etc) the operon number is typically seven with only occasional strains, having six or eight operons. Related genera such as Mannheimia and Haemophilus typically have 5 or 6 operons. However, Candidatus biochmannia and Buchnera strains have only one operon. The difference here is genome size. These organisms all have genomes less than 1 MB. The predictions are of course not perfect, and one will see occasional exceptions. Thus, in Figure 1, one Actinobacillus strain only has three operons while all of the other close neighbors have six.
The fact that members of the same species generally have essentially the same number of rRNA operons has been pointed out previously . However, in the absence of the type of mapping shown here the phylogenetic extent to which this is true is not readily recognized. Initial mapping efforts  were not fully informative in this regard due to the modest number of species for which the requisite information was available at the time. Prior work has shown that rRNA copy number impacts organism life history [7, 10]. This suggests that gain or loss of rRNA operons would appear to be a potential method of adapting to different environments and one might envision numerous individual organisms in populations as having different numbers of rRNA operon. Although rRNA operon copy number has typically not been examined in multiple individuals within a population, the high conservation of numbers within similar species from different sources argues against this.
The maps provided here will be especially useful to those seeking to quantitatively characterize microbial ecosystems using 16S rRNA sequence characterizations. The number of times an organism is encountered must be adjusted for the size of its genome and especially the number of copies of the 16S rRNA gene it carries. Once 16S rRNA sequence data is available the approximate phylogenetic position of each organism can be estimated. The mappings can then be examined to obtain initial estimates of rRNA operon number and genome size by examining the neighboring phylogenetic groupings. With the relevant phylogenetic groupings identified one can then use the rrnDB database  to obtain the values for all organisms belonging to those groups.
Rainey FA, Ward-Rainey NL, Janssen PH, Hippe H: Clostridium paradoxum DSM 7308(T) contains multiple 16S rRNA genes with heterogeneous intervening sequences. Microbiology. 1996, 142: 2087-2095. 10.1099/13500872-142-8-2087.
Mylvaganam S, Dennis PP: Sequence heterogeneity between the twogenes encoding 16S rRNA from the halophilic archaeabacterium Haloarcula marismortui. Genetics. 1992, 130: 399-410.
Kim HL, Shin E, Kim HM, Go H, Roh J, Bae J, Lee K: Heterogeneous rRNA molecules encoded by Streptomyces coelicolor M145 genome are all expressed and assembled into ribosomes. J Microbiol Biotechnol. 2007, 17: 1708-1711.
Kim HL, Shin EK, Kim HM, Ryou SM, Kim S, Cha CJ, Bae J, Lee K: Heterogeneous rRNAs are differentially expressed during the morphological development of Streptomyces coelicolor. FEMS Microbiol Lett. 2007, 275: 146-152. 10.1111/j.1574-6968.2007.00872.x.
López-López A, Benlloch S, Bonfá M, Rodríguez-Valera F, Mira A: Intragenomic 16S rRNA divergence in Haloarcula marismortui is an adaptation to different temperatures. J Mol Evol. 2007, 65: 687-696. 10.1007/s00239-007-9047-3.
Acinas SG, Marcelino LA, Klepac-Ceraj V, Polz MF: Divergence and redundancy of 16S rRNA sequences in genomes with multiple rrn operons. J Bacteriol. 2004, 186: 2629-2635. 10.1128/JB.186.9.2629-2635.2004.
Klappenbach JA, Saxman PR, Cole JR, Schmidt TM: rrndb: the ribosomal RNA operon copy number database. Nucl Acids Res. 2000, 29: 181-184. 10.1093/nar/29.1.181.
Lee ZM, Bussema C, Schmidt TM: rrnDB: documenting the number of rRNA and tRNA genes in bacteria and archaea. Nucleic Acids Res. 2009, 37: D489-493. 10.1093/nar/gkn689.
Dethlefsen L, Schmidt TM: The performance of the translational apparatus varies with the ecological strategies of bacteria. J Bacteriol. 2007, 189: 3237-3245. 10.1128/JB.01686-06.
Stevenson BS, Schmidt TM: Life history implications of ribosomal RNA gene copy number in Escherichia coli. Appl Environ Microbiol. 2004, 70: 6670-6677. 10.1128/AEM.70.11.6670-6677.2004.
Klappenbach J, Dunbar JM, Schmidt TM: rRNA gene copy number predicts ecological strategies in bacteria. Appl Environ Microbiol. 2000, 66: 1328-1333. 10.1128/AEM.66.4.1328-1333.2000.
Tuova TP: Copy number of ribosomal operons in prokaryotes and its effect on phylogenetic analyses. Mikrobiologia. 2003, 72: 437-452.
Einen J, Thorseth IH, Ovreås L: Enumeration of Archaea and Bacteria in seafloor basalt using real-time quantitative PCR and fluorescence microscopy. FEMS Microbiol Lett. 2008, 282: 182-187. 10.1111/j.1574-6968.2008.01119.x.
Siefert JL, Fox GE: Phylogenetic mapping of bacterial Morphology. Microbiology SGM. 1998, 144: 2803-2808. 10.1099/00221287-144-10-2803.
Wu M, Eisen JA: A simple, fast, and accurate method of phylogenomic inference. Genome Biol. 2009, 9: R151-10.1186/gb-2008-9-10-r151.
Zmase CM, Eddy SR: ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics. 2001, 17: 383-384. 10.1093/bioinformatics/17.4.383.
This research was supported in part by grants to GEF from the Robert A. Welch Foundation (E-1451), the Texas Advanced Research Program, the NASA Exobiology program (NNG05GN75G), and the Institute of Space Systems Operations
GEF conceived of the study and wrote the paper. MW constructed the tree. ID and GEF tabulated the genome sizes and operon copy number data. RR drew the trees, devised and implemented the coloring schemes.