Environmental rRNA inventories miss over half of protistan diversity
© Jeon et al; licensee BioMed Central Ltd. 2008
Received: 03 February 2008
Accepted: 16 December 2008
Published: 16 December 2008
The main tool to discover novel microbial eukaryotes is the rRNA approach. This approach has important biases, including PCR discrimination against certain rRNA gene species, which makes molecular inventories skewed relative to the source communities. The degree of this bias has not been quantified, and it remains unclear whether species missed from clone libraries could be recovered by increasing sequencing efforts, or whether they cannot be detected in principle. Here we attempt to discriminate between these possibilities by statistically analysing four protistan inventories obtained using different general eukaryotic PCR primers.
We show that each PCR primer set-specific clone library is not a sample from the community diversity but rather from a fraction of this diversity. Therefore, even sequencing such clone libraries to saturation would only recover that fraction, which, according to the parametric models, varies between 17 ± 4% to 49 ± 10%, depending on the set of primers. The pooled data is thus qualitatively richer than individual libraries, even if normalized to the same sequencing effort.
The use of a single pair of primers leads to significant underestimation of the true community richness at all levels of taxonomic hierarchy. The majority of available protistan rRNA gene surveys likely sampled less than half of the target diversity, and might have completely missed the rest. The use of multiple PCR primers reduces this bias but does not necessarily eliminate it.
Over the past several years there has been a surge of studies applying the rRNA approach  to discover and inventory microbial eukaryotes in many environments [cf., [2–4]]. These studies have documented an unprecedented diversity of novel protists at all levels of taxonomic hierarchy, and made an important contribution to the study of microeukaryotic richness, biogeography, and evolution. The cultivation-independent approach that employs cloning and sequencing of 18S rRNA gene fragments that are PCR-amplified from environmental genomic DNA will most likely continue to play a unique role in microbial discovery, especially since metagenomics approaches, so successful in bacterial and archaeal research [5, 6], are less practical for microbial eukaryotes owing to their large genome size. It is therefore important to know if such approach misses some eukaryotes, and if so, how many, and how to minimize the bias.
It is well known that PCR primers discriminate for and against certain sequences, and that the distribution of rRNA gene amplification products is markedly different from that in the original DNA extract (and target community) [7–15]. It is also known that rRNA gene libraries of typical size (dozens to hundreds of clones) overlap little in their species lists, and that the multiple PCR primer approach appears to detect greater protistan diversity than the use of a single primer set [16, 17]. What is not known is whether different clone libraries made from a single DNA source recover species from the same diversity pool, or from a smaller, PCR primers-specific pool of species uniquely amplifiable with these primers. In the former scenario, sequencing such clone libraries to saturation would result in the same species lists (albeit with different species frequency distributions), which would be a faithful representation of the target community composition. This possibility is interesting because massive sequencing is quickly becoming practical with the advance of high-throughput pyrosequencing technology [18–20]. An alternative scenario is that the PCR primer biases are so powerful that complete coverage of sample diversity is impossible using any of the developed general eukaryotic primers, a situation that needs to be considered when assessing protistan richness in a sample, community, or biosphere. Here we address this possibility by analyzing 4 reported 18S rRNA gene clone libraries [16, 21] obtained by applying 4 different general eukaryotic primer sets to a single extract of genomic DNA from a stratified water column in the Cariaco Basin off the coast of Venezuela. We apply a combined, multifaceted statistical approach that we developed to estimate microbial richness on the basis of a small sample of this richness [22, 23], and we compare the pools of diversity recoverable with each of the four single PCR primer sets to the diversity recoverable from the pooled data.
Results and discussion
Several aspects of the rRNA approach are widely recognized as biased, most notably the PCR primer bias leading to preferential amplification of some, but not the other, gene sequences [7–15]. The degree of selectivity is not known, and cannot be readily assessed by simply comparing the composition of clone libraries obtained using different primer sets. This is because protistan communities appear to be very diverse and rich in species, and no study has even come close to sequencing rRNA gene clone libraries to saturation. Each reported inventory is therefore only a subset of the target community's complete species list.
While the low overlap between inventories obtained using different primers suggests a strong primer bias , it is also possible that it is due to significant undersampling. Considering that there may be hundreds of protistan species in a typical aquatic sample , it has been difficult to differentiate between the two explanations. Here we extend a statistical strategy we developed earlier [22, 23] to resolve this problem. The test data are four previously published inventories of protistan species, which were obtained by applying four different eukaryotic primer sets to the same source of community genomic DNA. We statistically estimate the sizes of the four diversity pools as they appear from the four individual inventories, and compare those with the fifth estimate obtained from the pooled data. Our logic is simple: if the only difference among these five inventories (four individual and one pooled) is the species frequency distribution, such that what is detected in one library is also in principle detectable in the other library, then all five estimates of total protistan richness should converge on a single value. If on the other hand these estimates are statistically different, it will mean that PCR primer biases are so substantial that some species' DNA is simply not amplified, and such species will be practically undetectable, with some – but not the other – primer sets. In this case, the pooled data should produce an estimate of total richness significantly exceeding any estimate obtained from individual clone libraries.
The statistical analysis proceeded in three stages. First, we estimated total OTU richness (observed + unobserved) separately for each primer set, plus pooled, at each % similarity cutoff. For comparison we computed both parametric and nonparametric estimates of total richness; the former are probably more reliable for high-diversity microbial data, while the latter are typically biased downward  in this setting, but the final results from both methods were in reasonable agreement. For example, for the pooled data at the 99% similarity level, the parametric estimate of total richness (based on a mixture of two exponential abundance distributions) was 319 (standard error 85), and the corresponding nonparametric estimate (based on Chao's ACE1 ) was 311 (SE 84).
To overcome some of the statistical uncertainties inherent in analyzing all four primer sets vs. the pooled data, we also aggregated the four separate datasets and compared this aggregate to the pooled data. (Essentially this amounts to averaging the four individual primer sets' results and comparing this average to the pooled results). The results are shown in Figure 4. Again the nonparametric version gives higher results, but the confidence intervals overlap considerably. The most optimistic interpretation of Figure 4 is that, on average, the four primer sets can be expected to recover at most 60% of the total microbial diversity recoverable using the pooled approach.
The above statistical analyses showed that estimates of the protistan richness of the sample based on single PCR primer data sets do not significantly differ, and varied between 43 (SE 17) and 107 (SE 34) species (defined as OTUs grouping sequences that share at least 99% identity) (Figure 2). These analyses also showed that once the four clone libraries' data are pooled, and the new species frequency distribution is modeled, the estimate of the sample's richness grows to 319 (SE 85) (Figure 2). By modeling all of the estimates across all % similarity levels (hence taxonomic rank), we obtained sufficient statistical precision to conclude that each of the PCR-specific primer sets recovers a specific set of species, does not recover other species, and cannot in principle detect all the species in the sample. Each clone library would thus undersample the community even if sequenced to saturation, and the degree of such undersampling varies among the primer sets from 83% (17% recoverable) to 51% (49% recoverable). This means that the pooled data set is richer than individual libraries not because it is a larger collection of sequences but because it is less biased. One practical implication of this is that the microbial richness of a sample is better assessed by two clone libraries (created using different PCR primers) sequenced with X effort each rather than by one clone library sequenced with 2X effort. Increasing the diversity of the primers used on each DNA extract will lead to a more complete inventory of the extract, whereas even an unlimited amount of sequencing applied to a single clone library will only recover a portion of the DNA extract's richness.
An in silico investigation of primer specificity points to the importance of primer mismatch in determining the overall recovery. For example, the 528F primer set has at least one mismatch with 39% of 18S rRNA gene sequences in SILVA 18S rRNA gene sequence database, and at least two mismatches with 27% of such sequences. The figures for the 1391R primer are 30% and and 25%, respectively. Assuming that any mismatch prevents an efficient primer binding, and the overall efficiency is the product of individual primer efficiencies, then the 528F/1391R PCR primer set would only amplify 61% * 70% = 43% of SILVA sequences, explaining one half of what our analyses predict this primer set will miss in environmental studies.
We were surprised to see that the same holds at other levels of OTU grouping. It is presently impossible to determine whether a specific value of 18S rRNA gene sequence similarity could point to the organism's position within the α-taxonomy hierarchy. In other words, there is no clear correspondence between the degree of molecular divergence of the OTU and its taxonomic rank. However, since identity above 98–99% most likely indicates a very close relationship, and at 70% the protistan diversity addressed here collapses into one OTU, values in between must cover life forms differing at kingdom-, class-, family, and genus levels. Interestingly, at 80% gene sequence identity, the lowest threshold tested at which the sequences in question fell into more than one OTU, the pooled data still appeared qualitatively richer than individual clone libraries. This means that the single PCR primer approach is not only unlikely to recover all species in a sample, but it misses a substantial number of higher taxa as well.
We note that some PCR primer sets appear to be better than others in recovering target diversity. Both parametric and nonparametric modeling suggest that, out of the four combinations tested, the primer set IV (360F/1492R) recovers the most, and the primer set I (528F/1391R) the least part of the sample's richness (Figure 4). The use of multiple sets to amplify the sample's rRNA gene is clearly advantageous because this minimizes the PCR bias. Because the latter may only be reduced but not completely eliminated, the multiple PCR primer approach would still sample sequences from a portion of community diversity. Combined with high throughput sequencing technologies, this approach may detect all recoverable taxa, but still miss species that escape amplification with the given PCR primer sets. We note that the degree of biases we estimated is characteristic of the target (anaerobic) communities. These likely comprise less known species, increasing the probability of primer mismatches. It is possible that the biases in question are less pronounced for aerobic (e.g., water column) protists.
We demonstrated that standard rRNA inventories of protistan diversity, which typically employ a single PCR primer set to amplify protistan rRNA genes, are not samples from the entire community, but only from a fraction thereof. This seems to be the case at all taxonomic levels, from species to the highest taxonomic ranks. Increasing sequencing effort alone is unlikely to increase this fraction as it is grounded in PCR primer selectivity. Here we advocate coupling an increase in sequence coverage with the use of multiple PCR primer sets, because four such sets used here allow access in principle to larger diversity than a single set. The pooled data predicts that there were 319 species in the test sample. This estimate, while significantly larger than those obtained using individual clone libraries, may still be an underestimate because the use of multiple PCR primers is likely to minimize – rather then eliminate – the sampling biases.
The rRNA survey data used here were obtained from [16, 21]. These studies used the same DNA extract from a single 2.3 L sample collected just below the oxic/anoxic interface at a water depth of 340 m in the stratified water column of the Cariaco Basin, off the coast of Venezuela. DNA was extracted as described in , followed by PCR-aided amplification of ≈ 1,000- to 1,300-bp fragments of the 18S rRNA gene using four different primer sets: (Library I) E528F 5'-CGGTAATTCCAGCTCC-3' -Univ1391RE 5'-GGGCGGTGTGTACAARGRG-3' , (Library II) E528F-Univ1492RE 5'-ACCTTGTTACGRCTT-3' , (Library III) Euk A 5'-AACCTGGTTGATCCTGCCAGT-3'-Euk B 5'-TGATCCTTCTGCAGGTTCACCTAC-3'  followed by a nested reaction with E528F-Univ1517 5'-ACGGCTACCTTGTTACGAACTT-3' , and (Library IV) Euk A-Euk B followed by a nested reaction with 360FE 5'-CGGAGARGGMGCMTGAGA-3' -U1492R. The PCR protocol employed HotStart Taq DNA polymerase (QIAGEN, Valencia, CA) in all cases. The PCR products were cloned, separately for each primer set, commercially sequenced, and the inventories were checked for chimeric sequences using the Check_Chimera command of the Ribosomal Database Project (RDP) , as well as neighbor-joining trees with partial sequences (partial treeing analyses ). The 18S rRNA gene sequences were grouped into OTUs based on 99, 98, 97, 96, 95, 90, 80, 70, 60, and 50% sequence similarity cut off values. This was achieved by first making all possible pairwise sequence alignments using ClustalW at default settings  and calculating percent sequence similarities, followed by clustering of the sequences into OTUs using the mean unweighted-pair group method using average linkages as implemented in the OC clustering program http://www.compbio.dundee.ac.uk/Software/OC/oc.html. The OTU grouping was checked manually to verify that all OTUs were assembled at the cutoff level desired.
The statistical analyses operated on 35 datasets ((four primer sets plus pooled data) *(% similarity levels 80, 90, 95, 96, 97, 98, 99) = 5*7 = 35), and proceeded in three stages. First, we estimated the total OTU richness (observed + unobserved) based on each dataset separately. This can be done using two main families of methods, parametric and nonparametric. The former is probably more reliable for highly diverse microbial data , while the latter tends to be biased downward in such cases, but we carried out our complete study using both methods. Here the nonparametric total richness estimates were generally lower than the parametric estimates, and slightly more regular in terms of their variation across primer sets and % similarity levels.
Second, we fit a joint model to all 35 data points simultaneously, using the following logic. We know from empirical experience (Bunge and Woodard, 2008, in preparation) that the total number of OTUs from a given sample (here, the data derived from a particular primer set, or the pooled data) increases exponentially as a function of the % similarity cutoff. Equivalently, the log of the total richness is a linear function of % similarity, i.e.,
log(total richness) ≈ constant + (slope coefficient)*(% similarity cutoff).
Third and finally, we examined the differences between the four primer sets and the pooled data. These differences can be seen in the vertical displacements of the lines in Figure 2 and 3: they have the same slope but different elevations (intercepts). The regression analysis yielded estimates of the elevations, and hence of the differences between them; we then converted these back from the log-scale to the original scale, and calculated Bonferroni-corrected 95% confidence intervals for them. These confidence intervals represent plausible ranges for the total OTU richness in principle recoverable by each of the four primer sets, expressed as a proportion (percentage) of the total richness recoverable by pooling the data from all four. The results are shown in Figure 4. Note that the percentages are generally higher for the analyses based on nonparametric richness estimates, reflecting the vertical-scale compression of the lines mentioned above. However, the 95% confidence intervals overlap for each primer set, indicating reasonably good agreement of the analyses.
To overcome some of the statistical uncertainties inherent in analyzing all four primer sets vs. the pooled data, we also aggregated the four separate datasets and compared this to pooled. Conceptually this amounts to averaging the four individual lines in Figure 2 (or) 3, and comparing the resulting line to the line derived from the pooled data. The results are shown in Figure 4. Again the nonparametric version gives higher results, but the confidence intervals overlap considerably.
Finally we considered potential artifacts due to sample size. It is known that all statistical estimators of total population richness are biased for finite samples. However, the degree and direction of this bias are not known in general, and mathematical analysis of this problem has revealed considerable complexity, which is beyond the scope of our discussion here (Bunge and Barger, 2008; Mao and Lindsay, 2007). In order to assess whether, in the present situation, the differences between the estimated richness based on the individual primer samples, and the estimated richness based on the pooled sample, could be attributed to this bias, we carried out a simulation study as follows.
i. We set the model fitted to the pooled data at the 97% similarity level to be the "true" population distribution. (The abundance distribution here was a mixture of two exponentials, θ3 exp(-λ/θ1)/θ1 + (1-θ3)exp(-λ/θ2)/θ2, λ > 0, with θ1 = 0.2405, θ2 = 3.0954, and θ3 = 0.9512.)
ii. We simulated reduced samples from this population, in proportion to the sample sizes for each of the four primer sets in our real data.
iii. For each simulated sample we estimated the total population richness.
iv. We replicated the entire "experiment" 10 times, i.e., we obtained 10 estimates for each of the four reduced sample sizes.
The authors thank Dr. V. Ilyin from Northeastern University for help in clustering the sequencing data, and the Academic Research Computing User Group at Northeastern University for permitting us to use the Opportunity computer cluster. This study was supported by NSF Grant MCB-0348341 to SSE and by the Deutsche Forschungsgemeinschaft grant STO414/2-1 to TS. We thank Linda Woodard for supervising the statistical computations. This research was conducted using the resources of the Cornell University Center for Advanced Computing, which receives funding from Cornell University, New York State, the National Science Foundation, and other leading public agencies, foundations, and corporations.
- Olsen GJ, Lane DJ, Giovannoni SJ, Pace NR, Stahl DA: Microbial ecology and evolution: a ribosomal RNA approach. Annu Rev Microbiol. 1986, 40: 337-365. 10.1146/annurev.mi.40.100186.002005.View ArticlePubMedGoogle Scholar
- Diez B, Pedros-Alio C, Massana R: Study of genetic diversity of eukaryotic picoplankton in different oceanic regions by small-subunit rRNA gene cloning and sequencing. Appl Environ Microbiol. 2001, 67 (7): 2932-2941. 10.1128/AEM.67.7.2932-2941.2001.PubMed CentralView ArticlePubMedGoogle Scholar
- Lopez-Garcia P, Rodriguez-Valera F, Pedros-Alio C, Moreira D: Unexpected diversity of small eukaryotes in deep-sea Antarctic plankton. Nature. 2001, 409 (6820): 603-607. 10.1038/35054537.View ArticlePubMedGoogle Scholar
- Moon-van der Staay SY, De Wachter R, Vaulot D: Oceanic 18S rDNA sequences from picoplankton reveal unsuspected eukaryotic diversity. Nature. 2001, 409 (6820): 607-610. 10.1038/35054541.View ArticlePubMedGoogle Scholar
- Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, et al.: The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol. 2007, 5 (3): e77-10.1371/journal.pbio.0050077.PubMed CentralView ArticlePubMedGoogle Scholar
- Venter JC, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, et al.: Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004, 304 (5667): 66-74. 10.1126/science.1093857.View ArticlePubMedGoogle Scholar
- Acinas SG, Klepac-Ceraj V, Hunt DE, Pharino C, Ceraj I, Distel DL, Polz MF: Fine-scale phylogenetic architecture of a complex bacterial community. Nature. 2004, 430 (6999): 551-554. 10.1038/nature02649.View ArticlePubMedGoogle Scholar
- Caron DA, Countway PD, Brown MV: The growing contributions of molecular biology and immunology to protistan ecology: molecular signatures as ecological tools. J Eukaryot Microbiol. 2004, 51 (1): 38-48. 10.1111/j.1550-7408.2004.tb00159.x.View ArticlePubMedGoogle Scholar
- Countway PD, Gast RJ, Savai P, Caron DA: Protistan diversity estimates based on 18S rDNA from seawater incubations in the Western North Atlantic. J Eukaryot Microbiol. 2005, 52 (2): 95-106. 10.1111/j.1550-7408.2005.05202006.x.View ArticlePubMedGoogle Scholar
- Frey JC, Angert ER, Pell AN: Assessment of biases associated with profiling simple, model communities using terminal-restriction fragment length polymorphism-based analyses. J Microbiol Methods. 2006, 67 (1): 9-19. 10.1016/j.mimet.2006.02.011.View ArticlePubMedGoogle Scholar
- Kurata S, Kanagawa T, Magariyama Y, Takatsu K, Yamada K, Yokomaku T, Kamagata Y: Reevaluation and reduction of a PCR bias caused by reannealing of templates. Appl Environ Microbiol. 2004, 70 (12): 7545-7549. 10.1128/AEM.70.12.7545-7549.2004.PubMed CentralView ArticlePubMedGoogle Scholar
- Polz MF, Cavanaugh CM: Bias in template-to-product ratios in multitemplate PCR. Appl Environ Microbiol. 1998, 64 (10): 3724-3730.PubMed CentralPubMedGoogle Scholar
- Sipos R, Szekely AJ, Palatinszky M, Revesz S, Marialigeti K, Nikolausz M: Effect of primer mismatch, annealing temperature and PCR cycle number on 16S rRNA gene-targetting bacterial community analysis. FEMS Microbiol Ecol. 2007, 60 (2): 341-350. 10.1111/j.1574-6941.2007.00283.x.View ArticlePubMedGoogle Scholar
- Suzuki MT, Giovannoni SJ: Bias caused by template annealing in the amplification of mixtures of 16S rRNA genes by PCR. Appl Environ Microbiol. 1996, 62 (2): 625-630.PubMed CentralPubMedGoogle Scholar
- Webster G, Newberry CJ, Fry JC, Weightman AJ: Assessment of bacterial community structure in the deep sub-seafloor biosphere by 16S rDNA-based techniques: a cautionary tale. J Microbiol Methods. 2003, 55 (1): 155-164. 10.1016/S0167-7012(03)00140-4.View ArticlePubMedGoogle Scholar
- Stoeck T, Hayward B, Taylor GT, Varela R, Epstein SS: A multiple PCR-primer approach to access the microeukaryotic diversity in the anoxic Cariaco Basin (Caribbean Sea). Protist. 2006, 157: 31-43. 10.1016/j.protis.2005.10.004.View ArticlePubMedGoogle Scholar
- Dawson SC, Pace NR: Novel kingdom-level eukaryotic diversity in anoxic environments. Proc Natl Acad Sci USA. 2002, 99 (12): 8324-8329. 10.1073/pnas.062169599.PubMed CentralView ArticlePubMedGoogle Scholar
- Huber JA, Mark Welch DB, Morrison HG, Huse SM, Neal PR, Butterfield DA, Sogin ML: Microbial population structures in the deep marine biosphere. Science. 2007, 318 (5847): 97-100. 10.1126/science.1146689.View ArticlePubMedGoogle Scholar
- Roesch LF, Fulthorpe RR, Riva A, Casella G, Hadwin AK, Kent AD, Daroub SH, Camargo FA, Farmerie WG, Triplett EW: Pyrosequencing enumerates and contrasts soil microbial diversity. ISME J. 2007, 1 (4): 283-290.PubMed CentralPubMedGoogle Scholar
- Sogin ML, Morrison HG, Huber JA, Mark Welch D, Huse SM, Neal PR, Arrieta JM, Herndl GJ: Microbial diversity in the deep sea and the underexplored "rare biosphere". Proc Natl Acad Sci USA. 2006, 103 (32): 12115-12120. 10.1073/pnas.0605127103.PubMed CentralView ArticlePubMedGoogle Scholar
- Stoeck T, Taylor G, Epstein SS: Novel eukaryotes from a permanently anoxic Cariaco Basin (Caribbean Sea). Appl Environ Microbiol. 2003, 69: 5656-5663. 10.1128/AEM.69.9.5656-5663.2003.PubMed CentralView ArticlePubMedGoogle Scholar
- Hong S-H, Bunge J, Jeon S-O, Epstein S: Predicting microbial species richness. Proc Natl Acad Sci USA. 2006, 103: 117-122. 10.1073/pnas.0507245102.PubMed CentralView ArticlePubMedGoogle Scholar
- Jeon SO, Bunge J, Stoeck T, Barger KJ, Hong SH, Epstein SS: Synthetic statistical approach reveals a high degree of richness of microbial eukaryotes in an anoxic water column. Appl Environ Microbiol. 2006, 72 (10): 6578-6583. 10.1128/AEM.00787-06.PubMed CentralView ArticlePubMedGoogle Scholar
- Chao A: Species richness estimation. Encyclopedia of Statistical Sciences. Edited by: Balakrishnan C, Read B, Vidakovic B. 2005, New York: Wiley, 12: 7907-7916. 2Google Scholar
- Elwood HJ, Olsen GJ, Sogin ML: The small-subunit ribosomal RNA gene sequences from the hypotrichous ciliates Oxytricha nova and Stylonychia pustulata. Mol Biol Evol. 1985, 2 (5): 399-410.PubMedGoogle Scholar
- Lane DJ: 16S/23S rRNA sequencing. Nucleic Acid Techniques in Bacterial Systematics. Edited by: Stackebrandt E, Goodfellow M. 1991, Chichester, U.K: John Wiley and Sons, 115-175.Google Scholar
- Edgcomb VP, Kysela DT, Teske A, de Vera Gomez A, Sogin ML: Benthic eukaryotic diversity in the Guaymas Basin hydrothermal vent environment. Proc Natl Acad Sci USA. 2002, 99 (11): 7658-7662. 10.1073/pnas.062186399.PubMed CentralView ArticlePubMedGoogle Scholar
- Medlin LEH, Stickel S, Sogin ML: The characterization of enzymatically amplified eukaryotic 16S-like rRNA-coding regions. Genetics. 1988, 491-499. 71Google Scholar
- Shopsin B, Gomez M, Montgomery SO, Smith DH, Waddington M, Dodge DE, Bost DA, Riehman M, Naidich S, Kreiswirth BN: Evaluation of protein A gene polymorphic region DNA sequencing for typing of Staphylococcus aureus strains. J Clin Microbiol. 1999, 37: 3556-3563.PubMed CentralPubMedGoogle Scholar
- Maidak BL, Cole JR, Lilburn TG, Parker CT, Saxman PR, Farris RJ, Garrity GM, Olsen GJ, Schmidt TM, Tiedje JM: The RDP-II (Ribosomal Database Project). Nucleic Acids Res. 2001, 29 (1): 173-174. 10.1093/nar/29.1.173.PubMed CentralView ArticlePubMedGoogle Scholar
- Hugenholtz P, Goebel BM, Pace NR: Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J Bacteriol. 1998, 180 (18): 4765-4774.PubMed CentralPubMedGoogle Scholar
- Thompson JD, Higgens DG, Gibson TJ: CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties, and weight matrix choice. Nucleic Acids Res. 1994, 33: 4673-4680. 10.1093/nar/22.22.4673.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.