Research article | Open | Published:
Sequence analysis of percent G+C fraction libraries of human faecal bacterial DNA reveals a high number of Actinobacteria
BMC Microbiologyvolume 9, Article number: 68 (2009)
The human gastrointestinal (GI) tract microbiota is characterised by an abundance of uncultured bacteria most often assigned in phyla Firmicutes and Bacteroidetes. Diversity of this microbiota, even though approached with culture independent techniques in several studies, still requires more elucidation. The main purpose of this work was to study whether the genomic percent guanine and cytosine (%G+C) -based profiling and fractioning prior to 16S rRNA gene sequence analysis reveal higher microbiota diversity, especially with high G+C bacteria suggested to be underrepresented in previous studies.
A phylogenetic analysis of the composition of the human GI microbiota of 23 healthy adult subjects was performed from a pooled faecal bacterial DNA sample by combining genomic %G+C -based profiling and fractioning with 16S rRNA gene cloning and sequencing. A total of 3199 partial 16S rRNA genes were sequenced. For comparison, 459 clones were sequenced from a comparable unfractioned sample. The most important finding was that the proportional amount of sequences affiliating with the phylum Actinobacteria was 26.6% in the %G+C fractioned sample but only 3.5% in the unfractioned sample. The orders Coriobacteriales, Bifidobacteriales and Actinomycetales constituted the 65 actinobacterial phylotypes in the fractioned sample, accounting for 50%, 47% and 3% of sequences within the phylum, respectively.
This study shows that the %G+C profiling and fractioning prior to cloning and sequencing can reveal a significantly larger proportion of high G+C content bacteria within the clones recovered, compared with the unfractioned sample in the human GI tract. Especially the order Coriobacteriales within the phylum Actinobacteria was found to be more abundant than previously estimated with conventional sequencing studies.
The gastrointestinal (GI) microbiota is considered to play an important role in human health and disease via essential metabolic, trophic and protective functions in the host . Since the majority of the GI bacteria are uncultivable, molecular biology methods are needed to reveal the detailed composition, diversity and specific role of this complex microbial community . The bacterial groups most often detected in molecular studies of the healthy human GI tract are phyla Firmicutes (especially Clostridium clusters XIVa and IV), Bacteroidetes, Proteobacteria, Actinobacteria, Fusobacteria and Verrucomicrobia . The predominant microbiota in adults is considered rather stable and host-specific [4, 5], but gender, geographic origin, age [6, 7], and host genotype  may influence its composition. Furthermore, alterations within an individual's environmental factors, such as diet  and dietary supplements , intestinal health status  and antibiotics , may also have a substantial effect on the intestinal microbiota. Therefore, as a reference to altered conditions, knowledge of the characteristics of a healthy intestinal microbiota is essential.
The proportional amounts of bacterial phyla detected in studies on the GI tract microbiota depend on both the sample handling and DNA extraction methods applied  and the analysis . Recent metagenomic and pyrosequencing studies on the human intestinal microbiota highlight the potential amount of the yet undiscovered diversity of phylotypes and reshape the porportional abundances of the detected phyla, revealing e.g. a higher abundance of Actinobacteria than previously estimated [14–16]. However, the conventional 16S rRNA gene cloning and sequencing is still a valuable method, since it gives a relatively high taxonomic resolution due to longer read length  and can be targeted to a phylogenetically relevant gene (16S rRNA gene) in comparison with the metagenomic approach. Furthermore, the clone library obtained serves as a valuable reference for possible future use. To enhance the recovery of phylotypes in bacterial community samples, the genomic %G+C content -based profiling and fractioning of DNA can be used [17–20].
In a previous study comparing patients suffering from irritable bowel syndrome (IBS) with healthy volunteers, the faecal DNA of 23 healthy donors was pooled and %G+C profiled and three selected fractions, covering 34% of the fractioned DNA, were cloned and sequenced . With the aim to comprehensively elucidate the bacterial phylotype diversity of the GI microbiota of healthy subjects, the remaining seven %G+C fractions were cloned and sequenced in this study, to represent the scale of bacterial genomic %G+C content ranging from 25% to 75% . For methodological comparison, a clone library from unfractioned pooled faecal DNA samples of the same study subjects was constructed. The results provide more detailed insight into the human GI microbiota especially in the context of the diversity of high %G+C bacteria, i.e. Actinobacteria.
Percent guanine plus cytosine -profiling, cloning and sequencing
To analyse the diversity of the healthy human intestinal microbiota, a %G+C profiled and fractionated (Figure 1) pooled faecal bacterial DNA sample of 23 individuals was cloned, and the partial 16S rRNA genes were sequenced. The previously published 976 sequences from three %G+C fractions (%G+C 25–30, 40–45 and 55–60)  were combined with the 2223 new sequences cloned in this study (%G+C fractions 30–35, 35–40, 45–50, 50–55, 60–65, 65–70 and 70–75) for phylogenetic and statistical analyses of the complete %G+C profile ranging from 25% G+C to 75% G+C (Figure 1, Table 1). Altogether, 3199 sequences encompassing approximately 450 bp from the 5'-end of the 16S rRNA gene, covering two variable areas V1 and V2, were sequenced from all clones from the fractioned sample. For comparison, 459 clones were sequenced from an unfractioned pooled faecal bacterial DNA sample originating from the same individuals.
Determination of operative taxonomic units and library coverage
The quality-checked 3199 sequences from the combined fractioned sample libraries represented 455 operative taxonomic units (OTUs), and the 459 sequences from the unfractioned sample represented 131 OTUs with a 98% similarity criterion (Table 1). All novel OTUs with less than 95% sequence similarity to public sequence database entries were further sequenced to near full-length (Additional file 1). The coverages of the individual clone libraries of the fractioned sample ranged from 77% to 93%, while the coverage for the unfractioned sample was 86%  (Table 1). Compared with other fractions, the fractions %G+C 50–55, 55–60 and 60–65 had low OTU numbers and few singletons, resulting in high Good's coverage values. The combined sequences from the fractioned and unfractioned samples clustered into 481 OTUs (Figure 2).
Phylogenetic analysis and sequence affiliation
When the sequence data from the fractioned clone libraries were combined, the majority of the sequences were assigned to the phyla Firmicutes (68.5%), Actinobacteria (26.6%), Bacteroidetes (3.1%) and Proteobacteria (1.3%) (Figure 2, Table 2, Additional file 1). Clostridium clusters IV and XIV were the most abundant Firmicutes represented by 23.5% and 33.0% of the sequences, respectively. The 65 actinobacterial phylotypes consisted of the orders Bifidobacteriales, Coriobacteriales and Actinomycetales accounting for 12.4%, 13.4% and 0.8% of the sequences, respectively (Figure 3, Table 2).
The distribution of phyla within the individual clone libraries of the fractioned sample revealed that Firmicutes settled mostly in the lower %G+C content portion of the profile, whereas Actinobacteria were found in the fractions with a %G+C content ranging from 50% to 70% (Figure 2, Additional file 1). Prominent phylotypes had a seemingly broader distribution across %G+C fractions. In the fractions having %G+C content above 65%, a bias was observed, i.e. a decrease in high G+C Actinobacteria and an increase in low G+C Firmicutes. The three OTUs with the highest number of sequences fell into the Clostridium clusters XIVa and IV, representing the species Eubacterium rectale (cluster XIVa), Faecalibacterium prausnitzii (cluster IV) and Ruminococcus bromii (cluster IV) with over 98.7% sequence similarity.
Within the phylum Actinobacteria, the most abundant Coriobacteriales phylotypes (6 OTUs) according to the number of representative clones (228 clones) affiliated with Collinsella sp. (C. aerofaciens). The remainder represented Atopobium sp., Denitrobacterium sp., Eggerthella sp., Olsenella sp. and Slackia sp. The order Bifidobacteriales consisted of 398 sequences and 15 phylotypes out of which Bifidobacterium adolescentis was the most abundant. Rest of the bifidobacterial OTUs affiliated with B. catenulatum, B. pseudocatenulatum, B. bifidum, B. dentium and B. longum. The order Actinomycetales comprised of 11 OTUs affiliating with Actinomyces sp., Microbacterium sp., Propionibacterium sp., Rhodococcus sp. and Rothia sp. (Figure 3).
The unfractioned sample essentially resembled the %G+C fractions 40–45 and 45–50 (Figure 2). In comparison to the combined fractioned clone libraries' the amount of Firmicutes (93.2%), especially the percentage of the Clostridium cluster XIV (51.0%), increased while the number of Actinobacteria (3.5%) decreased. The proportion of Bacteroidetes (2.8%) and Proteobacteria (0.2%) were the least affected phyla when fractioned and unfractioned libraries were compared (Figure 2, Table 2, Additional file 1). All 16 actinobacterial sequences of the unfractioned library were included in OTUs of the fractioned libraries and Actinomycetale s phylotypes were absent in this library (Figure 3). The phyla Actinobacteria differed significantly (p = 0.000) between the fractioned and unfractioned libraries in the UniFrac Lineage-specific analysis, though the libraries overall were similar according to the UniFrac Significance test (p = 1.000). Clones from the phylum Firmicutes present in the fractioned library but absent in the unfractioned library affiliated with Enterococcaceae, Lactobacillaceae and Staphylococcacceae. Furthermore, only one Gammaproteobacteria was found in the unfractioned library whereas the fractioned samples contained also the members of Alphaproteobacteria, Betaproteobacteria and Deltaproteobacteria (Table 2).
Comparison of individual libraries
The Shared OTUs and Similarity (SONS) program  was used to compare the unfractioned sample with each of the %G+C fractions and with the combined sequence data from the fractions (Table 3). Using a 98% similarity criterion for the phylotypes, at least 80% of sequences from %G+C fractions 30–35 and 35–40 were shared with the unfractioned sample (Vobs values). However, for two of the high %G+C content fractions with %G+C content from 55 to 65, the Vobs values were considerably lower (32–33%). When comparing the combined sequence data from the fractioned sample with the unfractioned sample, a higher percentage of sequences and OTUs in the unfractioned were shared.
Shannon entropies of clone libraries of the %G+C profiled sample
The %G+C fractions 50–55 and 55–60 had comparatively low Shannon entropies (Additional file 2), indicating lower diversity, and were abundant with bifidobacteria (Figure 2, Additional file 1). The peripheral %G+C fractions and the %G+C fraction 45–50 with sequences affiliating mainly with Clostridium clusters IV and XIV had comparatively higher diversity according to Shannon entropies. The peripheral fraction from the low %G+C end (25–30% G+C content) contained a substantial proportion of Firmicutes that do not belong to the Clostridum clusters IV and XIV. It had the highest Shannon entropy (Additional file 2), indicating rich diversity, and did not reach a plateau in the rarefaction curves (data not shown), which means that more OTUs would have been likely to appear after further sequencing.
For a comprehensive evaluation of the human intestinal microbiota, 16S rRNA gene clone libraries were constructed from a %G+C fractioned pooled faecal DNA sample of 23 healthy subjects followed by a sequence analysis of 3199 clones. Previously, only selected fractions of such profiles have been sequenced and analysed. For methodological comparison, a 16S rRNA gene library of unfractioned DNA from 22 individuals representing the same subject group was also constructed. The %G+C fractioning prior to cloning and sequencing enhanced the recovery of sequences affiliating with high G+C Gram-positive bacteria, namely the phylum Actinobacteria, proportionally over sevenfold compared with cloning and sequencing of an unfractioned sample.
A high amount of actinobacterial sequences recovered
If the proportional amount of DNA in each fraction is taken into account in estimating the abundance of phyla, 28.5% of the sequences would affiliate with Actinobacteria. Since the %G+C profile fractions represent individual cloning and sequencing experiments, in which an equal amount of clones were sequenced despite the different proportional amounts of DNA within the fractions, quantitative conclusions should be drawn carefully. However, %G+C fractions 50–70 were dominated by Actinobacteria, comprising 41% of the total DNA in the original sample fractioned (Figures 1 and 2, Additional file 1). The %G+C fractions 30–50 yield a similar phylotype distribution as the unfractioned library (Figure 2). These fractions, accounting for 54% of the profiled DNA, are dominated by the Firmicutes (Clostridium clusters XIV and IV) (Figure 1 and 2).
The relatively high proportion of actinobacterial sequences (26.6%) and phylotypes (65) identified in the combined sequence data of the %G+C fractioned sample exceed all previous estimations. In a metagenomic study by Gill and colleagues , 20.5% of 132 16S rRNA sequences from random shotgun assemblies affiliated with 10 phylotypes of Actinobacteria whereas no Bacteroidetes was detected. In accordance with our results, also a pyrosequencing study by Andersson and colleagues , the Actinobacteria (14.6%), dominated by a few phylotypes, outnumbered Bacteroidetes (2.5%). By contrast, in most of the earlier published studies on human faecal samples applying 16S rRNA gene amplification, cloning and sequencing, the relative amount of Actinobacteria has been 0–6% of the detected intestinal microbiota [12, 25–33]. Thus, the proportion of sequences affiliating with Actinobacteria (3.5%) in the unfractioned sample analysed in this study is comparable with previous estimations applying conventional 16S rRNA cloning and sequencing without %G+C fractioning.
Order Coriobacteriales abundant within Actinobacteria
We observed that several clones in the high %G+C fractions (60–70% G+C content) were tricky to sequence due to extremely G+C rich regions. These clones turned out to be members of order Coriobacteriales, which have been rare or absent in earlier 16S rRNA gene -based clone libraries of the intestinal microbiota. Over half of the actinobacterial OTUs in our study belonged to the order Coriobacteriales. Harmsen et al.  earlier suggested that applications based on 16S rRNA gene cloning as well as other methods of molecular biology may overlook the presence of the family Coriobacteriaceae in the human GI tract and they designed a group-specific probe for Atopobium (Ato291), covering most of the Coriobacteriaceae, the Coriobacterium group. Using Ato291, the abundance of detected intestinal cells in fluorescence in situ hybridization (FISH) is up to 6.3%. [6, 7, 35, 36]. Recently, Khachatryan and colleagues  did not detect any Actinobacteria from the 16S rRNA gene clone libraries of healthy subjects but the abundance with FISH using Ato291 was 7%. The authors suggested that constant underestimation of the high G+C Gram-positive bacteria might lead to misunderstanding their role in the healthy and diseased gut.
There are some data suggesting that the members of Coriobacteriaceae may be indicators of a healthy GI microbiota. Subjects with a low risk of colon cancer have been observed to have a higher incidence of Collinsella aerofaciens than subjects with a high risk of colon cancer . Furthermore, when faecal 16S rRNA gene sequences from metagenomic libraries of Crohn's diseased and healthy subjects were compared, the Atopobium group was more prevalent and the groups designated "other Actinobacteria" were exclusively detected in healthy subjects' samples . A lower abundance of a C. aerofaciens-like phylotype within the Atopobium group has been associated with IBS subjects' samples . Diminished amount of Atopobium group bacteria is also associated with patients with Mediterranean fever . On the other hand, increased amount of Actinobacteria have recently been associated with the faecal microbiota of obese subjects . This indicates that more detailed data are required to judge the role of Actinobacteria in health and disease.
When the %G+C gradient is disassembled, the fractions with the highest G+C content are collected last, making them most susceptible to turbulence. This phenomenon together with possible remnants of DNA from previously collected fractions could have caused the bias of a decrease in high G+C Actinobacteria and an increase in low G+C Firmicutes observed in fractions %G+C 65–75. These fractions, however, comprise only 5.5% of the total DNA, making the observed bias less important. Regarding faecal DNA extraction, the method used here was rather rigorous, allowing efficient DNA isolation also from more enduring Gram-positive bacteria. This might lower the relative amount of DNA from more easily lysed Gram-negative bacteria and thus explain the comparatively low amount of Bacteroides in both of the samples. Moreover, the relative share of Bacteroidetes phyla may be affected by the delay and temperature of freezing. In a real-time PCR study, a decrease of 50% in the Bacteroides group was observed in faecal sample aliquots frozen in -70°C within 4 h compared to samples that were immediately snap-frozen in liquid nitrogen (Salonen et al., personal communication). In our study, the samples were transported within 4 h of the defecation and stored at -70°C.
Abundance of Actinobacteria in the faeces of Scandinavian (Finnish and Swedish) subjects has been discovered independent of the methodology; the techniques used include %G+C profiling and 16S rDNA gene cloning (this study), FISH coupled with flow cytometry  and pyrosequencing . These findings may suggest existence of demographic similarities among Scandinavians, which could be caused by environmental or genetic factors and that are not obscured by methodological bias of DNA extraction, primers and PCR conditions used.
The results further confirm that %G+C fractioning is an efficient method prior to PCR amplification, cloning and sequencing to obtain a more detailed understanding of the diversity of complex microbial communities, especially within the high genomic %G+C content region. This is proven by the proportionally greater amount of OTUs and sequences affiliating with the high G+C Gram-positive phylum Actinobacteria in the 16S rRNA gene clone libraries originating from a %G+C-profiled and -fractioned faecal microbial genomic DNA sample compared with a sample cloned and sequenced without prior %G+C profiling. The clone content obtained from the unfractioned library is in accordance with many previous clone library analyses and thus suggests that the potential underestimation of high G+C gram positive bacteria, have hidden the importance of these bacteria in a healthy gut. The phyla Actinobacteria were the second most abundant phyla detected in the %G+C fractioned sample consisting mainly of sequences affiliating with mainly Coriobacteriaceae.
The faecal samples were collected from 23 healthy donors (females n = 16, males n = 7), with an average age of 45 (range 26–64) years, who served as controls for IBS studies [21, 38–40]. Exclusion criteria for study subjects were pregnancy, lactation, organic GI disease, severe systematic disease, major or complicated abdominal surgery, severe endometriosis, dementia, regular GI symptoms, antimicrobial therapy during the last two months, lactose intolerance and celiac disease. All participants gave their written informed consent and were permitted to withdraw from the study at any time.
Faecal DNA samples
Faecal samples were immediately stored in anaerobic conditions after defecation, aliquoted after homogenization and stored within 4 h of delivery at -70°C. The bacterial genomic DNA from 1 g of faecal material was isolated according to the protocol of Apajalahti and colleagues . Briefly, undigested particles were removed from the faecal material by three rounds of low-speed centrifugation and bacterial cells were collected with high-speed centrifugation. The samples were then subjected to five freeze-thaw cycles, and the bacterial cells were lysed by enzymatic (lysozyme and proteinase K) and mechanical (vortexing with glass beads) means. Following cell lysis, the DNA was extracted and precipitated.
Percent guanine plus cytosine fractioning and purification of fractions
The faecal microbial DNA of 23 healthy individuals was pooled, and genomic DNA fractions were separated with 5% intervals on the basis of %G+C content using caesium chloride-bisbenzimidazole gradient analysis described in previous studies [21, 41]. The gradient was disassembled into %G+C fractions with 5 G+C% intervals using perfluorocarbon (fluorinert) as a piston. In the procedure, the highest %G+C fraction is collected last, exposing it to the most turbulence. The DNA quantification during the dismantlement was based on A280, as described by Apajalahtiand colleagues , to avoid background. The DNA fractions were desalted with PD-10 columns according to the manufacturer's instructions (Amersham Biosciences, Uppsala, Sweden). For the unfractioned DNA sample, faecal microbial DNA of the same healthy individuals was pooled (n = 22; there was an insufficient amount of faecal DNA left for one of the individuals).
Amplification of the 16S rRNA genes, cloning and sequencing
The 16S rRNA gene from each of the seven DNA fractions was amplified, cloned and sequenced, as in the study by Kassinen and colleagues . To maximize the recovery of different phylotypes, two universal primer pairs were used independently for all samples. The first primer pair corresponded to Escherichia coli 16S rRNA gene positions 8–27 and 1492–1512, with sequences 5'-AGAGTTTGATCCTGGCTCAG-3'  and 5'-ACGGCTACCTTGTTACGACTT-3' , respectively. The second primer pair corresponded to E. coli 16S rRNA gene positions 7–27 and 1522–1541, with sequences 5'-GAGAGTTTGATYCTGGCTCAG-3' and 5'-AAGGAGGTGATCCARCCGCA-3' , respectively. The 50-μl PCR reactions contained 1 × DyNAzyme™ Buffer (Finnzymes, Espoo, Finland), 0.2 mM of each dNTP, 50 pmol of primers, 1 U of DyNAzyme™ II DNA Polymerase (Finnzymes, Espoo, Finland), 0.125 U of Pfu DNA polymerase (Fermentas, Vilnius, Lithuania) and 10 μl of desalted fractioned DNA template (containing less than 2 ng/μl of DNA) or pooled extracted DNA from the faecal samples. The thermocycling conditions consisted of 3 min at 95°C, followed by a variable number of cycles of 30 s at 95°C, 30 s at 50°C, 2 min at 72°C and a final extension of 10 min at 72°C. The number of PCR cycles used for each fraction was optimized to the minimum amount of cycles which resulted in a visually detectable band of the PCR product on ethidium bromide stained agarose gel. A protocol of 27, 20, 25 and 30 cycles was applied to %G+C fraction 25–30, 30–60, 60–65 and 65–75, respectively. The 16S rRNA gene from the unfractioned pooled faecal DNA sample was amplified using 20 PCR cycles. The amplifications were performed using 15 reactions, and the products were pooled, concentrated using ethanol precipitation, and eluted with 50 μl of deionized MilliQ water (Millipore, Billerica, MA, USA).
The precipitated PCR products were purified with the QIAquick PCR Purification Kit (Qiagen, Hilden, Germany), or using the QIAquick Gel Extraction Kit (Qiagen, Hilden, Germany) after excising from 1.25% SeaPlaque agar (Cambrex, East Rutherford, NJ, USA), and eluted in 35 μl of elution buffer. The concentration of the purified amplicons was estimated with serially diluted samples on 0.8% agarose gels with ethidium bromide staining. To enhance the cloning efficiency, adenine overhangs were added to the amplicons as follows: The two purified inserts were mixed in a 1:1 molecular ratio (the reaction mixture thus contained 10–30 ng/μl DNA) and incubated in a volume of 20 μl with 1 × DyNAzyme™ Buffer (Finnzymes, Espoo, Finland), 0.2 mM dNTPs and 0.4 U of DyNAzyme™ II DNA Polymerase (Finnzymes, Espoo, Finland) for 40 min at 72°C. The cloning was performed with the QIAGEN® PCR Cloning plus Kit (Qiagen, Hilden, Germany) according to the manufacturer's instructions. For the ligation reaction, 2 μl of the reaction mixture used for adding adenine overhangs to the amplicons was used as an insert. The ligation reaction was incubated overnight at 4°C. The plasmids were isolated and purified from the E. coli culture using MultiScreenHTS (Millipore, Billerica, MA, USA), and aliquots were stored in -80°C.
The cloned inserts were amplified from the pDrive plasmids using M13 forward 5'-GTAAAACGACGGCCAGT-3' and M13 reverse primers 5'-AACAGCTATGACCATG-3', visualized on a 1% agarose gel, stained with ethidium bromide and purified using a MultiScreen PCR384 Filter Plate (Millipore, Billerica, MA, USA). Sequencing of the 5'-end of 16S rDNA clones was performed with primer pD' 5'-GTATTACCGCGGCTGCTG-3' corresponding to the E. coli 16S rRNA gene position 536-518 . Near full-length sequencing was performed on one representative of each OTU showing less than 95% similarity to any EMBL nucleotide sequence database entry. For this purpose, primers pF' 5'-ACGAGCTGACGACAGCCATG-3'  and pE 5'-AAACTCAAAGGAATTGACGG-3' , corresponding to E. coli 16S rRNA gene positions 1073-1053 and 908–928, respectively, were used. Sequencing of the products was performed with the BigDye terminator cycle sequencing kit (Applied Biosystems, Foster City, CA, USA). For templates that failed to be sequenced due to high G+C content, 1% (v/v) of dimethyl sulfoxide was added to the reaction mixture. The sequencing products were cleaned with Montage SEQ96 plates (Millipore, Billerica, MA, USA) and run with an ABI 3700 Capillary DNA Sequencer (Applied Biosystems, Foster City, CA, USA).
Sequence analysis and alignment
Sequences were checked manually utilizing the Staden Package pregap4 version 1.5 and gap v4.10 assembly programs , and primer sequences were removed. Sequences that occurred in more than one clone library were considered non-chimeric. Revealing the potential chimeras was also performed by manually browsing the ClustalW 1.83 sequence alignment  with Bio Edit version 184.108.40.206  and for the near full-length sequences using Ribosomal Database Project II Chimera Check . Sequences from %G+C fractions 25–30, 40–45 and 55–60 with accession numbers AM275396-AM276371  were added prior to further analyses. Sequences of all fractions and the unfractioned sample were aligned separately with ClustalW 1.83  using the FAST DNA pair-wise alignment algorithm option (Gap penalty 3, Word size 4, Number of top diagonals 1 and Window size 1) and cut from E. coli position 430 (totally conserved GTAAA) with BioEdit version 220.127.116.11 . The lengths of the alignments of the fractioned sample and the unfractioned sample were 478 and 457 base pairs, respectively. The 16S rRNA variable regions V1 and V2 were included in the alignments. The variable regions V1 and V2 have been demonstrated to be sufficient to reflect the diversity of a human GI clone library . The alignments were visually inspected, but they were not edited manually to avoid subjectivity and to maintain reproducibility of the alignments. From the cut alignments, distance matrices were created with Phylip 3.66 Dnadist  using Jukes-Cantor correction.
Determination of OTUs and library coverage
The sequences were assigned into OTUs according to the distance matrices using DOTUR , applying the furthest neighbour rule option in which all sequences within an OTU fulfil the similarity criterion with all the other sequences within the OTU. The 98% cut-off for sequence similarity was used to delimit an OTU. The coverage of the clone libraries was calculated with the formula of Good  to evaluate the adequacy of amount of sequencing. The Fasta EMBL Environmental and EMBL Prokaryote database searches  and Ribosomal Database Project II (RDP II) Classifier Tool  were used to affiliate phylotypes.
For the phylogenetic analysis, all sequences from the %G+C fractioned sample and the unfractioned sample were aligned and designated into OTUs with a 98% cut-off as described above. A representative sequence of each OTU and unaligned reference sequences representing different clostridial groups (Additional file 3) were aligned with ClustalW 1.83 using the SLOW DNA alignment algorithm option (Gap penalty 3, Word size 1, Number of top diagonals 5 and Window size 5) and cut from the E. coli position 430 (totally conserved GTAAA) with BioEdit version 18.104.22.168. For a profile alignment, 16S rRNA reference sequences, aligned according to their secondary structure, were selected from the European ribosomal RNA database  (Additional file 4) so that they would represent the overall diversity of the faecal microbiota, including the most common clostridial 16S rRNA groups expected, and sequences closely related to the OTUs composed of over 20 sequences. The sequences in this study were profile-aligned against the European ribosomal RNA database secondary structure-aligned sequences using ClustalW 1.83 profile alignment mode and the SLOW DNA alignment algorithm option (Gap penalty 3, Word size 1, Number of top diagonals 5 and Window size 5). The reference sequences were then deleted from the alignment with BioEdit version 22.214.171.124 , and the alignment was cut at the E. coli position 430 (totally conserved GTAAA). A phylogenetic tree with a representative sequence from each OTU was generated with a neighbour-joining algorithm from a Jukes-Cantor-corrected distance matrix using Phylip 3.66 dnadist and neighbour . The tree was visualized with MEGA4 .
A phylogenetic tree was constructed for the OTU representatives of the phylum Actinobacteria. For Bifidobacteriales and Actinomycetales, sequences with nearest FASTA EMBL Prokaryote search (all >98% similarity), and for Coriobacteriales sequences with nearest FASTA EMBL prokaryote and environmental database searches (>85% and >91%, respectively), were selected and aligned together with OTU representative sequences. Sequences from the European ribosomal RNA database representing Actinobacteria and Clostridium leptum (AF262239) were used as a reference in the profile alignment (Additional file 4). The alignment, distance matrix, and visualizing was done as described above. A bootstrap analysis of hundred replicates was performed using seqboot and consense programs of Phylip 3.66 .
To describe whether the phylogenies of the combined sequence data from the fractioned libraries and the unfractioned library were significantly different, the UniFrac Significance analysis was applied for each pair of environments using abundance weights . The UniFrac Lineage-specific analysis was used to break the tree up into the lineages at a specified distance from the root, and to test whether any particular group differed between the sample libraries . The phylogenetic tree for the analyses was constructed from OTU representative sequences determined separately for the combined fractioned libraries and for the unfractioned library as described above, with the exception that in the profile alignment a root sequence (Methanobrevibacter smithii AF054208) was added and left to the alignment.
Comparison of individual libraries using SONS
The microbial community composition differences between libraries of individual %G+C profile fractions and the unfractioned sample were analysed using SONS , which calculates the fraction of sequences observed in shared OTUs in each library (Uobs and Vobs) and the observed fraction of shared OTUs in each library (Aotu_shared and Botu_shared). For the SONS analyses, an alignment with all of the sequences from the clone libraries of the fractioned sample and the unfractioned sample was created, and a distance matrix was calculated as described above in the Sequence analysis and alignment section.
Shannon entropies of clone libraries of the %G+C profiled sample
To compare the diversity of the clone libraries derived from the fractioned sample, OTUs were also determined using a Bayesian clustering method , followed by the estimation of Shannon entropies with a standard Bayesian multinomial-Dirichlet model. In the estimation, 100 000 Monte Carlo samples were used for each library under a uniform Dirichlet prior . The Shannon entropy value correlates with the amount and evenness of clusters or phylotypes in a community sample, but disregards the disparity between them . The Bayesian clustering method groups the sequences into clusters more distinct from each other than would, for example, the ClustalW alignment-based Jukes-Cantor-corrected distance matrices, demanding more disparity among the sequences present in a sample for them to form separate clusters.
Nucleotide sequence accession numbers
The 16S rRNA gene sequences reported in this study have been deposited in the EMBL Nucleotide Sequence Database under accession numbers AM404446–AM406668 and AM888398–AM888856.
Guarner F: Enteric flora in health and disease. Digestion. 2006, 73 (Suppl 1): 5-12.
Rajilic-Stojanovic M, Smidt H, de Vos WM: Diversity of the human gastrointestinal tract microbiota revisited. Environ Microbiol. 2007, 9 (9): 2125-2136.
Zoetendal EG, Rajilic-Stojanovic M, de Vos WM: High-throughput diversity and functionality analysis of the gastrointestinal tract microbiota. Gut. 2008, 57 (11): 1605-1615.
Zoetendal EG, Akkermans AD, De Vos WM: Temperature gradient gel electrophoresis analysis of 16S rRNA from human fecal samples reveals stable and host-specific communities of active bacteria. Appl Environ Microbiol. 1998, 64 (10): 3854-3859.
Vanhoutte T, Huys G, de Brandt E, Swings J: Temporal stability analysis of the microbiota in human feces by denaturing gradient gel electrophoresis using universal and group-specific 16S rRNA gene primers. FEMS Microbiol Ecol. 2004, 48 (2): 437-446.
Lay C, Rigottier-Gois L, Holmstrøm K, Rajilić M, Vaughan EE, de Vos WM, Collins MD, Thiel R, Namsolleck P, Blaut M, Doré J: Colonic microbiota signatures across five northern European countries. Appl Environ Microbiol. 2005, 71 (7): 4153-4155.
Mueller S, Saunier K, Hanisch C, Norin E, Alm L, Midtvedt T, Cresci A, Silvi S, Orpianesi C, Verdenelli MC, Clavel T, Koebnick C, Zunft HJ, Doré J, Blaut M: Differences in fecal microbiota in different European study populations in relation to age, gender, and country: a cross-sectional study. Appl Environ Microbiol. 2006, 72 (2): 1027-1033.
Khachatryan ZA, Ktsoyan ZA, Manukyan GP, Kelly D, Ghazaryan KA, Aminov RI: Predominant role of host genetics in controlling the composition of gut microbiota. PLoS ONE. 2008, 3 (8): e3064-
Ley RE, Hamady M, Lozupone C, Turnbaugh PJ, Ramey RR, Bircher JS, Schlegel ML, Tucker TA, Schrenzel MD, Knight R, Gordon JI: Evolution of mammals and their gut microbes. Science. 2008, 320 (5883): 1647-1651.
Kajander K, Myllyluoma E, Rajilić-Stojanović M, Kyrönpalo S, Rasmussen M, Järvenpää S, Zoetendal EG, de Vos WM, Vapaatalo H, Korpela R: Clinical trial: multispecies probiotic supplementation alleviates the symptoms of irritable bowel syndrome and stabilizes intestinal microbiota. Aliment Pharmacol Ther. 2008, 27 (1): 48-57.
Manichanh C, Rigottier-Gois L, Bonnaud E, Gloux K, Pelletier E, Frangeul L, Nalin R, Jarrin C, Chardon P, Marteau P, Roca J, Doré J: Reduced diversity of faecal microbiota in Crohn's disease revealed by a metagenomic approach. Gut. 2006, 55 (2): 205-211.
Dethlefsen L, Huse S, Sogin ML, Relman DA: The Pervasive Effects of an Antibiotic on the Human Gut Microbiota, as Revealed by Deep 16S rRNA Sequencing. PLoS Biol. 2008, 6 (11): e280-
Salonen A, Palva A, de Vos WM: Microbial functionality in the human intestinal tract. Front Biosci. 2009, 14: 3074-3084.
Gill SR, Pop M, Deboy RT, Eckburg PB, Turnbaugh PJ, Samuel BS, Gordon JI, Relman DA, Fraser-Liggett CM, Nelson KE: Metagenomic analysis of the human distal gut microbiome. Science. 2006, 312 (5778): 1355-1359.
Kurokawa K, Itoh T, Kuwahara T, Oshima K, Toh H, Toyoda A, Takami H, Morita H, Sharma VK, Srivastava TP, Taylor TD, Noguchi H, Mori H, Ogura Y, Ehrlich DS, Itoh K, Takagi T, Sakaki Y, Hayashi T, Hattori M: Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes. DNA Res. 2007, 14 (4): 169-181.
Andersson AF, Lindberg M, Jakobsson H, Bäckhed F, Nyrén P, Engstrand L: Comparative analysis of human gut microbiota by barcoded pyrosequencing. PLoS ONE. 2008, 3 (7): e2836-
Holben WE, Jansson JK, Chelm BK, Tiedje JM: DNA Probe Method for the Detection of Specific Microorganisms in the Soil Bacterial Community. Appl Environ Microbiol. 1988, 54 (3): 703-711.
Nüsslein K, Tiedje JM: Characterization of the dominant and rare members of a young Hawaiian soil bacterial community with small-subunit ribosomal DNA amplified from DNA fractionated on the basis of its guanine and cytosine composition. Appl Environ Microbiol. 1998, 64 (4): 1283-1289.
Holben WE, Feris KP, Kettunen A, Apajalahti JH: GC fractionation enhances microbial community diversity assessment and detection of minority populations of bacteria by denaturing gradient gel electrophoresis. Appl Environ Microbiol. 2004, 70 (4): 2263-2270.
Holben WE, Harris D: DNA-based monitoring of total bacterial community structure in environmental samples. Mol Ecol. 1995, 4 (5): 627-631.
Kassinen A, Krogius-Kurikka L, Mäkivuokko H, Rinttilä T, Paulin L, Corander J, Malinen E, Apajalahti J, Palva A: The fecal microbiota of irritable bowel syndrome patients differs significantly from that of healthy subjects. Gastroenterology. 2007, 133 (1): 24-33.
Galtier N, Lobry JR: Relationships between genomic G+C content, RNA secondary structures, and optimal growth temperature in prokaryotes. J Mol Evol. 1997, 44 (6): 632-636.
Good IJ: The population frequencies of species and the estimation of population parameters. Biometrika. 1953, 40: 237-264.
Schloss PD, Handelsman J: Introducing SONS, a tool for operational taxonomic unit-based comparisons of microbial community memberships and structures. Appl Environ Microbiol. 2006, 72 (10): 6773-6779.
Wilson KH, Blitchington RB: Human colonic biota studied by ribosomal DNA sequence analysis. Appl Environ Microbiol. 1996, 62 (7): 2273-2278.
Suau A, Bonnet R, Sutren M, Godon JJ, Gibson GR, Collins MD, Doré J: Direct analysis of genes encoding 16S rRNA from complex communities reveals many novel molecular species within the human gut. Appl Environ Microbiol. 1999, 65 (11): 4799-4807.
Bonnet R, Suau A, Doré J, Gibson GR, Collins MD: Differences in rDNA libraries of faecal bacteria derived from 10- and 25-cycle PCRs. Int J Syst Evol Microbiol. 2002, 52 (Pt 3): 757-763.
Eckburg PB, Bik EM, Bernstein CN, Purdom E, Dethlefsen L, Sargent M, Gill SR, Nelson KE, Relman DA: Diversity of the human intestinal microbial flora. Science. 2005, 308 (5728): 1635-1638.
Li M, Wang B, Zhang M, Rantalainen M, Wang S, Zhou H, Zhang Y, Shen J, Pang X, Zhang M, Wei H, Chen Y, Lu H, Zuo J, Su M, Qiu Y, Jia W, Xiao C, Smith LM, Yang S, Holmes E, Tang H, Zhao G, Nicholson JK, Li L, Zhao L: Symbiotic gut microbes modulate human metabolic phenotypes. Proc Natl Acad Sci USA. 2008, 105 (6): 2117-2122.
Hayashi H, Sakamoto M, Benno Y: Phylogenetic analysis of the human gut microbiota using 16S rDNA clone libraries and strictly anaerobic culture-based methods. Microbiol Immunol. 2002, 46 (8): 535-548.
Delgado S, Suárez A, Mayo B: Identification of Dominant Bacteria in Feces and Colonic Mucosa from Healthy Spanish Adults by Culturing and by 16S rDNA Sequence Analysis. Dig Dis Sci. 2006, 51 (4): 744-751.
Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, Sogin ML, Jones WJ, Roe BA, Affourtit JP, Egholm M, Henrissat B, Heath AC, Knight R, Gordon JI: A core gut microbiome in obese and lean twins. Nature. 2009, 457 (7228): 480-484.
Ley RE, Turnbaugh PJ, Klein S, Gordon JI: Microbial ecology: human gut microbes associated with obesity. Nature. 2006, 444 (7122): 1022-1023.
Harmsen HJ, Wildeboer-Veloo AC, Grijpstra J, Knol J, Degener JE, Welling GW: Development of 16S rRNA-based probes for the Coriobacterium group and the Atopobium cluster and their application for enumeration of Coriobacteriaceae in human feces from volunteers of different age groups. Appl Environ Microbiol. 2000, 66 (10): 4523-4527.
Franks AH, Harmsen HJ, Raangs GC, Jansen GJ, Schut F, Welling GW: Variations of bacterial populations in human feces measured by fluorescent in situ hybridization with group-specific 16S rRNA-targeted oligonucleotide probes. Appl Environ Microbiol. 1998, 64 (9): 3336-3345.
Chassard C, Scott KP, Marquet P, Martin JC, Del'homme C, Dapoigny M, Flint HJ, Bernalier-Donadille A: Assessment of metabolic diversity within the intestinal microbiota from healthy humans using combined molecular and cultural approaches. FEMS Microbiol Ecol. 2008, 66 (3): 496-504.
Moore WE, Moore LH: Intestinal floras of populations that have a high risk of colon cancer. Appl Environ Microbiol. 1995, 61 (9): 3202-3207.
Malinen E, Rinttilä T, Kajander K, Mättö J, Kassinen A, Krogius L, Saarela M, Korpela R, Palva A: Analysis of the fecal microbiota of irritable bowel syndrome patients and healthy controls with real-time PCR. Am J Gastroenterol. 2005, 100 (2): 373-382.
Mättö J, Maunuksela L, Kajander K, Palva A, Korpela R, Kassinen A, Saarela M: Composition and temporal stability of gastrointestinal microbiota in irritable bowel syndrome–a longitudinal study in IBS and control subjects. FEMS Immunol Med Microbiol. 2005, 43 (2): 213-222.
Maukonen J, Satokari R, Mättö J, Söderlund H, Mattila-Sandholm T, Saarela M: Prevalence and temporal stability of selected clostridial groups in irritable bowel syndrome in relation to predominant faecal bacteria. J Med Microbiol. 2006, 55 (Pt 5): 625-633.
Apajalahti JH, Särkilahti LK, Mäki BR, Heikkinen JP, Nurminen PH, Holben WE: Effective recovery of bacterial DNA and percent-guanine-plus-cytosine-based analysis of community structure in the gastrointestinal tract of broiler chickens. Appl Environ Microbiol. 1998, 64 (10): 4084-4088.
Hicks RE, Amann RI, Stahl DA: Dual staining of natural bacterioplankton with 4',6-diamidino-2-phenylindole and fluorescent oligonucleotide probes targeting kingdom-level 16S rRNA sequences. Appl Environ Microbiol. 1992, 58 (7): 2158-2163.
Kane MD, Poulsen LK, Stahl DA: Monitoring the enrichment and isolation of sulfate-reducing bacteria by using oligonucleotide hybridization probes designed from environmentally derived 16S rRNA sequences. Appl Environ Microbiol. 1993, 59 (3): 682-686.
Wang RF, Kim SJ, Robertson LH, Cerniglia CE: Development of a membrane-array method for the detection of human intestinal bacteria in fecal samples. Mol Cell Probes. 2002, 16 (5): 341-350.
Edwards U, Rogall T, Blöcker H, Emde M, Böttger EC: Isolation and direct complete nucleotide determination of entire genes. Characterization of a gene coding for 16S ribosomal RNA. Nucleic Acids Res. 1989, 17 (19): 7843-7853.
Lane DJ, Pace B, Olsen GJ, Stahl DA, Sogin ML, Pace NR: Rapid determination of 16S ribosomal RNA sequences for phylogenetic analyses. Proc Natl Acad Sci USA. 1985, 82 (20): 6955-6959.
Staden R, Beal KF, Bonfield JK: The Staden package. Methods Mol Biol. 2000, 132: 115-130.
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22 (22): 4673-4680.
Hall T: BioEdit. Biological sequence alignment editor for Windows. 1998, North Carolina State University, NC, USA, [http://www.mbio.ncsu.edu/BioEdit/bioedit.html]
Cole JR, Chai B, Marsh TL, Farris RJ, Wang Q, Kulam SA, Chandra S, McGarrell DM, Schmidt TM, Garrity GM, Tiedje JM: The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy. Nucleic Acids Res. 2003, 31 (1): 442-443.
Wang X, Heazlewood SP, Krause DO, Florin TH: Molecular characterization of the microbial species that colonize human ileal and colonic mucosa by using 16S rDNA sequence analysis. J Appl Microbiol. 2003, 95 (3): 508-520.
Felsenstein J: PHYLIP – Phylogeny Inference package (Version 3.2). Cladistics. 1989, 164-166. 17
Schloss PD, Handelsman J: Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol. 2005, 71 (3): 1501-1506.
Pearson WR: Rapid and sensitive sequence comparison with FASTP and FASTA. Methods Enzymol. 1990, 183: 63-98.
Cole JR, Chai B, Farris RJ, Wang Q, Kulam SA, McGarrell DM, Garrity GM, Tiedje JM: The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 2005, D294-6. 33 Database
Wuyts J, Perriere G, Peer Van De Y: The European ribosomal RNA database. Nucleic Acids Res. 2004, D101-3. 32 Database
Kumar S, Tamura K, Nei M: MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Brief Bioinform. 2004, 5 (2): 150-163.
Lozupone C, Hamady M, Knight R: UniFrac–an online tool for comparing microbial community diversity in a phylogenetic context. BMC Bioinformatics. 2006, 7: 371-
Corander J, Tang J: Bayesian analysis of population structure based on linked molecular information. Math Biosci. 2007, 205 (1): 19-31.
Chapman & Hall/CRC, Gelman A, Carlin JB, Stern HS, Rubin DB: Bayesian Data Analysis. 2004, Chapman & Hall/CRC, 2
Krebs C: Ecological Methodology. 1989, New York: Harper&Collins, 1
This study was supported by the Finnish Funding Agency for Technology and Innovation (Grant no. 40160/05), the Academy of Finland (Grant no. 214 157) and the Finnish Graduate School on Applied Bioscience. This work was performed in the Centre of Excellence on Microbial Food Safety Research, Academy of Finland. We are grateful to Sinikka Ahonen, Anu Suoranta and Matias Rantanen for technical assistance and to Professor Willem M. de Vos and Doctors Erja Malinen and Ilkka Palva for providing constructive criticism during the writing of this manuscript. Doctors Jaana Mättö and Maria Saarela are gratefully acknowledged for recruiting of study subjects and management of sample collection. Kyösti Kurikka, MSc, and Sonja Krogius, BA, are thanked for assisting with the drawing of figures.
LK-K and AK conducted the sequence analysis and prepared the manuscript, LP supervised the sequencing library construction procedure, JC determined the Shannon entropies, HM performed %G+C fractioning of the pooled DNA samples, JT acted as bioinformatics specialist and provided scripts needed in data analysis for LK-K and AK, AP designed and supervised the study. All authors have contributed in the manuscript writing process as well as approved the final manuscript.