With the advent of next generation sequencing (NGS), the microbiome of ruminants is being profiled at high resolution and throughput and a complex picture is emerging wherein host genetics and microbiome structure additively contribute to several phenotypes, including methane emissions [13]. In order to investigate the ruminal microbiome composition of two cohorts of Colombian buffalos found to produce high or low levels of methane, we conducted shotgun metagenomics. The bioinformatics pipeline used in this study is described in Fig. 1.
Alignment of individual sequences
Taxonomic classification of sequences was performed using the software Kraken2 [19], and the standard database complemented with all bacterial, fungal, viral and archaeal sequences in the GenBank (including incomplete genomes) plus all sequences deposited in the Genome taxonomy database, GTDB (gtdb.ecogenomic.org). The GTDB hosts 145,512 bacterial accessions and 2392 archaeal accessions [20]. We discarded Kraken2 hits with less than 10% of the k-mers matching the reference sequence. Then, we kept hits with a relative abundance of at least 3 reads (sequences) per sample. A list of hits obtained with Kraken2 pseudoalignments is presented in Supplementary Table S1. A total of 582 taxa were identified in our data. Five taxa corresponded to Archaea (Supplementary Fig. S1), along with 576 taxa in the domain Bacteria, one fungus of unknown taxonomy and two virus taxa of unknown taxonomy.
Principal coordinate analysis ordination of a Bray-Curtis dissimilarity matrix showed that the microbiome of both groups of buffalos is apparently different, although there is considerable variability among animals in each group (Fig. 2a). The PCoA plot depicted in Fig. 2a shows that, along the first component (PC1), which capture 53% of the variance, most blue points (high methane emissions) located on the left part of the plot, while most red points (low methane emissions) clustered on the right part of the plot. Permutational Analysis of Variance (PERMANOVA), however, did not detect statistically significant differences between groups. We also conducted Analysis of Similarity (ANOSIM) and obtained a significant p-value (0.02), which again suggests that the groups under comparison are not statistically different. At the phylum level, the microbiome was dominated by Bacteroidetes and Firmicutes, and in decreasing abundance Actinobacteria and Protobacteria. Euryarchaeota was among the seven most abundant phyla, but with abundance much lower than that the rest of phyla (Fig. 2b). At the genus level, Prevotella was somewhat more abundant in the group of animals with lower methane emissions (Fig. 2e). At higher taxonomic levels, family, order and phylum, the same subtle trend was observed for Bacteroidaceae, Bacteroidales and Bacteroidota, respectively, which showed a moderately higher abundance in the group with lower methane emissions. However, high variability is evident inside each group.
We subjected the 100 most abundant taxa to hierarchical clustering of their Bray-Curtis dissimilarity indices using the hclust algorithm, which led to the identification of two clusters (Fig. 3). Namely, a larger cluster comprising 14 samples was integrated by 9 animals (64%) from the low-methane-emissions group and 5 animals (36%) from the high-methane-emissions group. This group exhibited higher abundance of the majority of the 100 most abundant bacteria, including 23 species of Prevotella (green bars), seven species of Butyrivibrio (blue bars), five species of Ruminococcus (red bars), also Selenomonas ruminantium, and Fibrobacter intestinalis, among others. The other cluster (on the right), was more heterogeneous and contained mostly animals from the high-methane-emissions group.
We compared the abundance of the five taxa in the family Methanobacteriaceae detected in our libraries. In general, a subtle higher abundance in four out of five Methanobacteriaceae taxa in the group of animals with higher methane emissions was observed (Supplementary Fig. S1). Such differences did not reach statistical significance, but they are clearly appreciated in the boxplots, although considerable variability inside groups was also observed (Supplementary Fig. S1).
We subjected the abundance of all taxa detected to linear discriminant analysis, using the software LEfSe [21]. Interestingly, based on uncorrected p-values, LEfSe suggested that at least 29 species in the genus Prevotella were more abundant in the low-methane-emissions group (Fig. 4a). However, although LEfSe reported increased abundance of those Prevotella species, in most cases p-values lost significance after correction for multiple comparisons with the Benjamini-Hochberg method. To better characterize this observation, we plotted the relative abundance of twelve species of Prevotella (Fig. 4b) and a clear trend is observed, although variability inside each group is considerably large.
We also conducted metabolic profiling with HUMAnN2 [22], but as is usual with non-human samples, results were incomplete and not very informative.
Combined assembly and identification of bacterial genomes
NGS libraries are a very fragmentary representation of the metagenome [23]. Thus, recovery of bacterial genomes is incomplete and somewhat stochastic. This implies that in two or more samples harbouring the same bacterium, different parts of the genome might be recovered. Therefore, it makes sense to conduct sequence assembly combining all samples and after identification/annotation of assembled contigs, alignment of individual samples to assembled contigs will determine the relative contribution, if any, of each sample to each contig. Therefore we conducted combined assembly of all 24 samples. The assembly of reads generated 3.6 million contigs with an N50 of 481 bp (min. 373; max. 53,776; average 491). We then subjected such contigs to binning with metaBAT2 [24]. From those, MetaBAT2 was able to cluster 78 putative bacterial genomes (bins), which were subjected to phylogenetics analysis and annotation with MAGpy [25]. Hereinafter, bins are referred to as MAGs.
The phylogenetic tree generated by MAGpy comprised three major clusters. The larger cluster (Fig. 5a; in black) contained many putative genomes that were very similar among them. The other two branches of the phylogenetic tree were more heterogeneous. The annotation assigned to each bin can be found in Supplementary Table S2. LEfSe analyses suggested that the genera Fibrobacter, Oscillibacter, Prevotella and the species Ruminococcaceae bacterium, Prevotella ruminicola, Bacteroidetes bacterium and a phage from Bacteroidetes were more abundant in the low-methane-emissions group, but p-values lost significance after correction for multiple comparisons. Because contigs used in this study are the result of an assembly procedure that included all samples, the relative contribution of each sample to each MAG is presented in Fig. 5b. Essentially, most MAGs were represented in all samples, with rather few exceptions (black cells in the lower half of heat map). The relative abundance of MAGs reported above is visibly larger in the group with lower methane emissions, but considerably heterogeneity is observed inside groups.
Protein predictions and annotation
We predicted a total of 4,467,657 representative non-redundant protein sequences that were aligned against the protein databases UniRef100 [26], RumiRef100 [26] and Hungate1000 [27]. RumiRef and Hungate1000 are databases of bacterial sequences from ruminal samples. Alignments were conducted with Diamond and the top hits were recovered. The best hit from each of the three alignments was selected for annotation.
To quantify the contribution of each sample to each protein sequence, individual samples sequences were aligned to the protein reference sequences with the method blastx of Diamond. An astringent filtering procedure was implemented as described in the Methods section. A total of 1309 protein sequences passed such filtering process (Supplementary Table S3). Among proteins most frequently detected were FAD-dependent oxidoreductase, Acyl-CoA dehydrogenase, Urocanate hydratase (homologous to Prevotella), subunit beta of a DNA-directed RNA polymerase, Methylmalonyl-CoA mutase, Pyridine nucleotide-disulfide oxidoreductase (homologous to Eubacterium cellulosolvens) and a Glutamate formimidoyltransferase (homologous to Prevotella sp. HUN102), among many others (see Supplementary Table S3).
When we conducted hierarchical clustering on the 100 most abundant proteins, the exact same larger cluster of samples identified with Kraken2 taxonomy assignments (Fig. 3a) was recapitulated (Fig. 6a). Based on linear discriminant analysis, 60 enzymes were more abundant in the group with lower emissions of methane, and only 10 were more abundant in the group with higher emissions of methane. Interestingly, 25 enzymes with homology to Prevotella proteins are among the hits more abundant in the low-methane-emissions group (Fig. 6b). All those hits, however, were not statistically significant after correction for multiple comparison.
In summary, we conducted a thorough characterization of the rumen microbiome of two small cohorts of buffalos that were found to produce either high or low methane emissions. The main findings suggest increased abundance of Prevotella species in the group of low methane emissions. This was very clear in results from several analytical approaches presented here, however statistical significance was not reached in many cases, which probably is derived from intra-group variability. Lack of statistical significance in microbiome means comparisons is often the result of the large number of taxa detected, which makes correction of p-values very stringent. The number of animals in each of our experimental groups was relatively small (n = 12), which will also result in relatively high p-values. Nonetheless, and apparent higher abundance of Prevotella in the group of animals with lower methane emissions is evident.