High-throughput sequencing for the detection of the bacterial and fungal diversity in Mongolian naturally fermented cow’s milk in Russia

Traditional fermented dairy products are major components of the typical Mongolian diet since ancient times. However, almost all the previous studies on the microbial composition of traditional Mongolian fermented dairy products analyzed food samples from the Chinese Mongolian region and Mongolia but not the Russian Mongolian region. In this study, the bacterial and fungal community diversity of nineteen naturally fermented cow’s milk (NFCM) samples from local Mongolian families residing in Kalmykia and Chita of Russia was investigated with pyrosequencing. Firmicutes and Ascomycota were the predominant phyla respectively for bacteria and fungi. The abundance of the bacterial phylum Acidobacteria was considerably different between the samples from the two regions. At genus level, Lactobacillus and Pichia were the predominating bacterial and fungal genera, respectively, while six bacterial genera significantly differed between the Kalmykia (enrichment of Aeromonas, Bacillus, Clostridium, Streptococcus, Vogesella) and Chita (enrichment of Lactococcus) samples. The results of principal coordinate analysis (PCoA) based on the bacterial or fungal composition of the Kalmykia and Chita samples revealed a different microbiota structure between the samples collected in these two locations. The redundancy analysis (RDA) identified 60 bacterial and 21 fungal OTUs as the key variables responsible for such microbiota structural difference. Our results suggest that structural differences existed in the microbiota of NFCM between Kalmykia and Chita. The difference in geographic environment may be an important factor influencing the microbial diversity of NFCM made by the Mongolians in Russia.


Background
Many traditional fermented dairy products, such as airag and tarag [1], are regarded as nutraceutics and have been the major components in a typical diet for the Mongol ethnic residing in Mongolia, Kazakhstan, Kyrgyzstan, and some Central Asian regions of Russia since ancient times. These traditional fermented foods remain popular until now. In this study, naturally fermented cow's milk (NFCM) samples were collected from Mongol-ethnic families of Kalmyk in the Republic of Kalmykia and Buryats in Chita that retain the Mongolian traditional life style and dietary habit. NFCM in these families is a common home-made dairy product, which is a white to yellowish-white liquid fermented from milk with or without specific cultures. NFCM are traditionally made by adding old fermented milk containing the natural starter cultures, thus these products are free from commercial starter cultures and own unique and stable microbial communities. Apart from their role as the NFCM-associated microbial community as a whole, some specific strains isolated from NFCM may have added value as producers of antimicrobial compounds [2] or as probiotics [3]. Although the nutritional value of NFCM is well-known, their microbiota composition has not been fully described. Therefore, it is of interest to further characterize the microbial diversity of NFCM.
The microbial biodiversity of fermented dairy products has been assessed by culture-dependent methods to identify the indigenous bacterial composition. A wide variety of Firmicutes genera, such as Lactobacillus, Lactococcus, Streptococcus, Leuconoctoc, which are known to influence the flavor and maturation of fermented food [4,5], have been identified. Some of them have also been isolated from different naturally fermented dairy products [6], including cheese [7], tarag [8], kurut [9], and kefir [10]. Relatively low levels of Bifidobacteriaceae, Moraxellaceae (mostly Enhydrobacter), Corynebacterium, Halomonas, Pediococcus, Micrococcus and Staphylococcus have been reported as compared with the total bacterial population in fermented dairy products [11].
In order to assess the microbial diversity of fermented dairy products, various culture-independent molecular biology-based techniques have been widely applied, such as denaturing gradient gel electrophoresis (DGGE) [11,12], temporal temperature gel electrophoresis (TTGE) [12], quantitative real-time PCR (qPCR) [13] and Sanger sequencing of 16S rRNA gene clone library [9]. These culture-independent techniques are valuable tools to monitor the microbial structure and dynamics in naturally fermented dairy products. They are not only able to reveal the dominant members present in the samples, but also uncover the rarer bacterial species [14].
Pyrosequencing is an automated high-throughput sequencing technique introduced by Margulies et al [15]. This method is based on the synthesis of single-stranded deoxyribonucleic acid and the detection of the light generated by pyrophosphate released during nucleotide incorporation. This technique is able to analyze the microbial structure, gene content, and hence reveal the microbial-based metabolic potential in an ecosystem through the rapid and accurate sequencing of nucleotide sequences. Pyrosequencing has successfully been used to detect the diversity and dynamics of the bacterial populations in various fermented foods, such as cheese [16,17], kefir [6,18], fermented seafood [19] and fermented soybean [20].
In this study, the microbial populations in NFCM were investigated. Nineteen NFCM samples from The Russian Republics of Kalmykia and Chita were evaluated by ribosomal gene targeted 454-pyrosequencing. However, almost all the previous papers focused on the microbial composition of the Mongolian traditional fermented dairy products analyzed samples prepared by the nomads of the Mongolian region of China and Mongolia [1,21,22] but not those made in the Mongolian region of Russia. Results of this study do not only provide accurate and detailed information on the microbiota diversity of NFCM, but also fill the gap of the study of the traditional Mongolian fermented dairy products.

Results
Bacterial and fungal sequence abundance and diversity A total of 268,549 of bacterial V1-V3 16S rRNA raw sequence reads were generated from the nineteen NFCM samples, with an average of 26,845 sequences read for each sample. Meanwhile, 113,513 of 18S rRNA raw sequence reads were generated for the fungal community. Through PyNAST alignment and 100% sequence identity clustering, the unique representative sequence reads were delimited for further analysis, which corresponded to 15,181 bacterial and 4,490 fungal sequences. The number of unique and classifiable representative OTU sequences for bacteria and fungi were, respectively, 573 (average = 147 OTUs per sample, range = 28-211, SD = 49.36) and 138 (average = 31.7 OTUs per sample, range = 14-56, SD = 13.9) (with high threshold identity at 97% sequence similarity level). Based on homologous sequence alignment method and clustering with information extracted from the RDP and BLAST databases, the lowest level of taxonomy of the identified OTUs was determined. 1.9% of bacterial and 39% of fungal sequences could not be assigned to the genus level.
The Shannon diversity curves but not rarefaction curves for all samples reached the saturation phase (Additional file 1: Figure S1), suggesting that although additional new phylotypes would possibly be identified by increasing the sequencing depth, the majority of bacterial and fungal phylotypes for the samples had already been captured in the current analysis.
The Shannon index, Simpson diversity index, Chao1 and observed species of each sample were used to evaluate the species richness and diversity (Table 1). Mann-Whitney test was applied to assess the richness and diversity of bacteria and fungi between Kalmykia and Chita samples. Since significant differences (p < 0.05) were observed in all indicators of the bacterial community in the two sampling sites, it can be concluded that a large distinction of species richness and diversity of the bacterial community existed between the two groups of samples. On the contrary, there were no statistically significant difference for any of the fungal indicators between samples from the two sampling sites (p > 0.05), indicating that a high similarity of fungal diversity was shared between the two sample groups.
The most abundant genus, Lactobacillus, was comprised of 37 different types of OTUs ( Figure 2). Based on the phylogenetic tree, some of these Lactobacillus OTUs could be further assigned to different Lactobacillus species. For example, OTU666 and OTU275 clustered with L. helveticus and L. delbrueckii subsp. bulgaricus respectively. OTU758 was closest to L. kefiranofaciens and L. kefiranofaciens subsp. kefiranofaciens. OTU322 were nearest to L. pentosus and L. plantarum. OTU37 and OTU380 affiliated to L. buchneri, meaning that they were likely to be the subspecies of L. buchneri.

Multivariate analysis of the bacterial communities of NFCM from Kalmykia and Chita
PCoA and MANOVA were performed to analyze the bacterial communities and structure of the two sample groups ( Figure 3A and B). PCoA based on the weighted (accounted for 59.9% and 21.8% of the total variance) ( Figure 3A) and the unweighted (accounted for 17.0% and 11.5% of the total variance) ( Figure 3B) UniFrac distances revealed the existence of bacterial structural difference, as symbols representing the two groups were largely separated on the unweighted but not weighted PCoA score plot. This result suggests that the difference lied not on the abundant OTUs. Even though there was overlapping of data points from the Kalmykia and Chita samples on both score plots, the results of MANOVA based on both the weighted (p < 0.05) and unweighted (p < 0.05) UniFrac distance confirmed that significant differences existed.
To further identify the bacterial populations that accounted for the difference observed between the Kalmykia and Chita NFCM samples, RDA was carried out ( Figure 4A). In this case, NFCM samples in Kalmykia and Chita were the "nominal environmental variables", and the relative abundances of all OTUs (at 97% similarity) were the response variables. Monte Carlo Permutation Test (MCPT) showed that the constrained ordination model was statistically significant (p =0.006) and the first canonical axis was able to explain 8.6% of the variability of response variables ( Figure 4A). Moreover, 60 OTUs were identified as the key variables, which had good correlation with sample scores on the RDA canonical axis ( Figure 4A). Among them, 7 OTUs (mainly, Lactobacillus, Leuconostoc, Macrococcus and Lactococcus) were enriched in the samples from Chita, while the other 53 OTUs (mainly, Streptococcus, Enterococcus, Clostridium, Saccharibacillus and Anoxybacillus) were enriched in the Kalmykia samples.

Fungal composition of the NFCM
The major fungal phyla found in the NFCM were Ascomycota, Basidiomycota and Chytridiomycota, constituting 99.77%, 0.008% and 0.002% of the fungal population ( Figure 1C), respectively. No fungal sequence read was detected in the samples, F11 and D12, which may due to an intrinsic low abundance of fungi present in these samples. At the genus level, 20 fungal genera were identified, among which six genera were highly variable between samples, namely, Pichia, Galactomyces, Vanderwaltozyma, Wickerhamomyces, Saccharomyces, and Cordyceps (representing 52.66%, 20.45%, 17.38%, 0.37%, 0.93%, and 0.24% of the fungal population) ( Figure 1D). Among them, Pichia, Galactomyces and Vanderwaltozyma were the dominant fungal genera (at least 1.0% of the total fungal gene pool) across all samples. Pichia contributed to more than 50% in most samples, except for the samples, ELS10, F2, E5, F9, F6, E1, F3, and E12, while Galactomyces was prevalent in the samples, F3, F6, and F9, contributing to more than 50% of total fungal sequences. Moreover, in the sample E12, Vanderwaltozyma contributed to 99% of total fungal sequences.

Comparison of the fungal composition between the Kalmykia and Chita samples
The weighted and unweighted UniFrac PCoA was performed to evaluate if any structural difference existed in the fungal community between the two sample groups ( Figure 3C and D). Although there was slight overlapping between the samples from the two groups on the score plots, data points were largely separated by both weighted (accounted for 48.9% and 23.4% of the total variance by the first two PCs) ( Figure 3C) and unweighted (accounted for 29.3% and 15.6% of the total variance by the first two PCs) ( Figure 3D) UniFrac analysis. However, a further MANOVA test showed that only the unweighted (p < 0.05) but not weighted (p > 0.05) UniFrac distance showed significant difference between sample groups. These results suggest that the structural difference in the fungal community was attributed to the less abundant fungal lineages. RDA was applied to further identify the specific fungal groups that accounted for the difference observed between the Kalmykia and Chita NFCM samples ( Figure 4B). MCPT showed that the constrained ordination model by Kalmykia or Chita was significant (P = 0.042). We identified 21 OTUs that fitted well by the sample scores on the canonical axis as key variables; each had at least 1.8% of the variability in its values explained by this axis. On the RDA ordination plot, 6 OTUs were enriched in Chita NFCM samples, and they belonged to the genera Saccharomyces, Pichia, Vanderwaltozyma and unidentified Trichocomaceae. Meanwhile, 15 OTUs were enriched in Kalmykia NFCM samples, which mainly belonged to the genera Vanderwaltozyma and Galactomyces.

Discussion
Pyrosequencing analysis is able to provide a thorough description of microbiota community in environmental samples, which also helps to reveal their potential function. In the current study, the bacterial and fungal communities of NFCM from Kalmykia and Chita were studied. Firmicutes, Proteobacteria, Bacteroidetes and Actinobacteria were the dominant bacteria at phylum level, while the abundance of Acidobacteria was considerably different between the two groups. Firmicutes was the most abundant phylum, which was in line with results from some previous studies performed in naturally fermented dairy products, such as kefir [10], fermented seafood [19], kochujang [20], pearl millet slurries [23], and artisanal cheeses [17]. The fungal community was dominated with Ascomycota, and this phylum was prevalent in Korean alcoholic beverages [24] and Chinese liquors [25]. As reflected by the Shannon index and rarefaction analysis, the bacterial diversity and richness were higher than that of the fungi in the same environment, which is consistent with a previous report on tarag [1]. The report found that the isolated number and density of lactic acid bacteria were considerably higher than that of yeasts in the studied traditional Mongolian fermented dairy products.
Lactobacillus and Lactococcus were the dominant genera for samples from both Kalmykia and Chita. These two genera are commonly associated with naturally fermented dairy products [1,11,26]. Lactobacillus was the predominant genus in our samples. The starter cultures of the homemade NFCM in this study were passage from the old NFCM. It was possible that the high proportion of Lactobacillus was retained and even enriched during NFCM fermentation. It has been reported that the acid tolerance of lactobacilli is higher than Streptococcus and Lactococcus [27]. Some species of Lactococcus have low acid tolerance, however, they could be isolated from raw milk and were found growing during the early stage of fermentation [11,21]. These may explain the lower abundance of Lactococcus than Lactobacillus as seen in our results.
Some reports claim that 16S rRNA is a slowly diverging molecule and it fails to reveal significant differences between recently diverged species; thus protein-encoding genes (such as recA gene for L. plantarum and L. casei [28][29][30]) may be used as the phylogenetic markers to ensure an accurate assignment of OTUs to the species or even subspecies level. Our results demonstrate a high resolution of pyrosequencing of 16S rRNA gene in assigning Lactobacillus to the precision of species and subspecies levels. Based on phylogenetic grouping of Lactobacillus OTU representative sequences, L. helveticus and L. delbrueckii subsp. bulgaricus were predominant in the NFCM. A similar Lactobacillus composition was reported in the traditional dairy products, airag and tarag [1,8].
Furthermore, the major genera (each comprised over 0.1% of total bacterial sequences), Streptococcus, Macrococcus, Leuconostoc, Enterococcus, Acetobacter, Citrobacter, Acinetobacter, Bacillus, Kluyvera, Enterobacter, were present in our NFCM samples. Among them, Streptococcus, Macrococcus, Leuconostoc, and Enterococcus have been reported as the subdominant genera in raw milk [5] and cheese [4], which aligned with the results of our study. Acinetobacter and Enterobacter were detected in Mexican alcoholic beverages. It was speculated that these genera were originated from the bacterial contamination in raw milk; and they subsequently decreased during the fermentation process [31].
Pichia, Galactomyces and Vanderwaltozyma were the dominant fungal genera in NFCM. Pichia is commonly associated with fermentative metabolic processes of naturally fermented dairy products and cheese [1,32,33], as this genus takes part in hydrolyzing milk protein and fat, assimilating lactic acid, and producing ethanol [34]. Galactomyces has been reported as the dominant yeasts in dairy products, which may involve in lactic acid utilization, proteolysis, lipolysis and flavour development [35]. However, a previously published study reported a different makeup of fungal community in two types of traditional Mongolian fermented products namely airag and tarag. Airag was dominated with the species Kluyveromyces marxianus, while the prevalent species in tarag were Saccharomyces cerevisiae, Issatchenkia orientalis, and Kazachstania unispora [1]. Our data revealed clear differences in the microbiota of NFCM collected in the two sampling locations. However, at this point, the exact reasons for the observed variation in the microbiota have not been identified. Similar to our study, previous reports have found that the exposure of cheese to different external environments during the manufacturing process impacted the microbiota composition of the final products [17,36]. The microbial communities of traditional fermented products, such as fermented milk [37] and kochujiang [20], varied with the sampling geographic or manufacturing region. Moreover, animals living at different altitudes, e.g. yak in the Tibetan region, may have different resistance to cold environment, leading to an unusual microbiota makeup of yak milk and its fermented products [38][39][40]. Kalmykia and Chita are located 6,165 kilometers apart in southern Russia. The NFCM collected from both sites were produced with the same manufacturing process. However, the two places have dissimilar altitude and climate characteristics. Kalmykia is located at the East European Plain, with averages of about 170 meters in elevation [41]. It has a continental climate, with very hot and dry summers (average July temperature is +24°C) and cold winters (average January temperature is −5°C) with little snow (average annual precipitation of up to 400 millimeters). On the contrary, Chita is located at the Central Siberian Plateau and the average elevation is about 500-700 meters [41]. It is generally colder in Chita than Kalmykia with the average summer and winter temperatures ranged from +19°C and −16°C, respectively. Additionally, Chita is comparatively humid than Kalmykia with an average annual precipitation of up to 500 millimeters. Thus, we speculate that the factor of geographic environment including altitudes and climate play a more significant role over the manufacturing process in resulting in the different microbiota compositions of the analyzed products. Our unpublished results revealed a higher microbial richness in the Mongolian style fermented products made by similar methods collected in Russia as compared with those from Mongolia and Inner Mongolia of China. This further confirms that the variation in the microbiota structure of these fermented products may be regional-based.
Some other crucial factors that may affect the fermented milk microbiota composition are the hygienic quality of the milk and the manufacturing process, as well as the back slopping technique. These factors may together contribute to the selection of specific microbial groups and affect the ecological succession of the microbial communities during the fermentation process.

Conclusions
To sum up, the current study analyzed the bacterial and fungal community diversity of NFCM in Kalmykia and Chita. Our results showed that clear structural differences existed in the microbiota of NFCM from Kalmykia and Chita. The current study has provided interesting insights into the relationship between the microbial composition of NFCM and their geographic regions. The results have also shown that pyrosequencing technique is useful for detecting a wide diversity of microorganisms in NFCM. The results obtained here will be valuable for screening for beneficial strains from traditional fermented dairy products, which are made by the Mongolians residing in Russia. Our study has also provided novel data for Mongolian traditional fermented dairy products.

Sample collection
A total of nineteen NFCM samples were collected from Russia, including nine samples (E12, F10, F2, F3, F4, F6, F9, F11, Ra17) from the Republic of Kalmykia and ten samples (D8, D9, D12, E1, E2, E3, E4, E5, ELS10, ELS20) from Chita. The distance between Kalmykia and Chita is about 6,165 kilometers (Additional file 2: Figure S2). All the NFCM was made in a similar way by natural fermentation of cow's milk in a special traditional container. Firstly, the raw cow's milk was boiled and cooled, followed by the step of skimming of lipids. An enrichment starter was then inoculated in the milk by backslopping. The mixture was allowed to ferment overnight at ambient temperature. Samples were collected aseptically and were kept in ice boxes during transportation. The sample sequence information was listed in Table 1.

DNA extraction and PCR amplification
Genomic DNA was extracted from each sample using Qiagen DNA Stool Mini Kit (Qiagen, Hilden, Germany) following the manufacturer's instruction. The quality of extracted DNA was checked by 0.8% agarose gel electrophoresis and spectrophotometry (optical density at 260nm/280nm ratio). All extracted DNA samples were stored at -20°C for further analysis.

Pyrosequencing and bioinformatics processing
The extraction of high-quality sequences was firstly performed with the QIIME package (Quantitative Insights Into Microbial Ecology) (v1.2.1). Raw sequences were selected based on sequence length, quality, primer and tag. According to the following criteria [44,45], low-quality sequences were removed: (i) raw reads were shorter than 110 nucleotides, (ii) sequences had imperfect matches to the barcode, (iii) sequences displayed a fuzzy match to at least one end of primers, (iv) sequences with only a short variable region of less than 100 nucleotides, (v) raw reads that had a quality score of <20 in more than 7% of the bases in the variable region. PyNAST [46] and UCLUST [47] were then applied to align the extracted high-quality sequences under 100% clustering of sequence identity in order to obtain representative sequences. The unique sequence set was classified into operational taxonomic units (OTUs) under the threshold of 97% identity using UCLUST after the selection of the representative sequences. ChimeraSlayer was applied to remove the potential chimeric sequences in the representative set of OTUs [48]. The taxonomy of each OTU representative sequence was performed using the Ribosomal Database Project (RDP) II database [49] that classified at a minimum bootstrap threshold of 80%. OTUs occurred only once or twice were discarded. A de novo taxonomic tree was constructed employing a representative chimera-checked OTU set in FastTree for downstream analysis [50], including the beta diversity calculation. The Shannon-Wiener, Simpson's diversity, the Chao1 and rarefaction estimators were calculated in order to evaluate the alpha diversity. UniFrac distance [51] was based on the phylogenetic tree. Both weighted and unweighted calculations were performed for the principal coordinate analysis (PCoA).

Statistical analyses
Statistical analyses were performed mainly under Matlab® environment (The MathWorks, Natick, MA, USA) and using the software Canoco for Windows 4.5 (Microcomputer Power, NY, USA). Rarefaction analysis, Shannon diversity index and Simpson's diversity index were used to estimate the richness and diversity of OTUs. Principal coordinate analysis (PCoA) was used to assess the microbiota structure of different samples. Redundancy analysis (RDA) was applied to identify microbial groups that significantly contributed to the structural difference. Differences in the relative abundances of taxonomic groups at phylum and genus levels between samples were evaluated with Mann-Whitney test. P-values of less than 0.05 were considered significantly different between sample pairs. The phylogenetic grouping of Lactobacillus representative sequences was performed according to Felis and Dellaglio [52] and was constructed using MAGE 5.0 [53].