Metagenomic profiling of IBD or non-IBD participants
As summarized in Fig. 1, the overall metagenomic analysis includes filtering, profiling, longitudinal dissection, biomarker screening, modeling, and microbial dynamics test. The fecal metagenome dataset used in this study was downloaded from the Inflammatory Bowel Disease Multi-'Omics Database (IBDMD) of iHMP, which were longitudinally generated from 130 participants (103 IBD and 27 non-IBD subjects).
As described in Methods, the data quality was tested and metagenomic samples from valid subjects satisfying selection conditions were considered for further analysis (Additional file 1: Table S1). Microbial taxonomy was assessed at the species level using MetaPhlAn2, and the quality of the compositional data were controlled according to the three specific conditions mentioned in Methods (Additional file 2: Table S2) [35]. The number of filtered samples was 1526 samples from 106 participants (80 IBD and 26 non-IBD), and the metadata of participants such as sex, age, and collection days were comparable between IBD patients and non-IBD subjects (Additional file 3: Table S3). The global distribution did not show distinct tendency to sex, IBD-activity, subject, and data generation sites (Additional file 4: Figure S1).
Consistent with the previous reports, two major phyla in human gut, Firmicutes and Bacteroidetes showed a complementary distribution in the plot of principal coordinate analysis (PCoA) (Fig. 2a) [36]. The microbiomes of IBD and non-IBD subjects were generally distinguishable. Samples of non-IBD subjects were mainly localized in a left-lower quadrant and ones of IBD patients were more widely distributed along PC1 axis (probability value of IBD vs. non-IBD, PIBD-Non (PC1) < 2.2e-16, Fig. 2b). The representative subtypes of IBD, ulcerative colitis (UC) and Crohn’s diseases (CD), were not significantly segregated by PC1 and PC2 axes (p-values of UC vs. CD, PUC-CD (PC1) = 0.1726, PUC-CD (PC2) = 0.0988), implying that the two idiopathic inflammatory disorders share similar microbial community (Fig. 2b). Overall microbiome seemed to be distinct by subjects and largely stable over time (Fig. 2c). As grouped by K-means clustering, most of non-IBD samples belonged to cluster C3, suggesting that microbiome from non-IBD subjects should be relatively convergent relative to those from UC or CD (Odd Ratio (OR)nonIBD-C3 = 4.42, ORUC-C3 = 2.30, ORCD-C2 = 2.15).
Multiple alpha diversity indices like Shannon diversity, Pielou’s evenness, and richness (the number of observed species per sample) were lower in samples with IBD than in those without IBD as expected. There were no significant differences in alpha diversity indices between CD and UC, but severe inflammation lowered Shannon diversity and richness (Fig. 2d, Additional file 5: Figure S2a, b).
In addition, the fraction of human reads sequenced together with gut metagenomic data was high in IBD rather than in non-IBD and highest in active stage of IBD among three stages of IBD, which means that a leakage of host genome into gut lumen might mirror the severity of disorders in gut (Fig. 2e). Accordingly, the human sequence read fraction was positively correlated with diseases severity scores such as simple clinical colitis activity index (SCCAI) for UC and Harvey-Bradshaw index (HBI) for CD (Additional file 5: Figure S2c, d).
Detection of F. nucleatum and its longitudinal dissection
F. nucleatum is rarely found in gut microbiome. Among 1526 fecal samples from 106 participants, F. nucleatum occurred 41 times in 19 subjects (15 IBD and 4 non-IBD). The ratio of IBD to non-IBD subjects was not significantly different in F. nucleatum-detected subjects (Fisher’s one-sided test, p-value 0.4757), but the ratio of IBD to non-IBD samples had marginal preference to chronic inflammation due to the recurrent observation of F. nucleatum in IBD patients (OR = 1.79, Fisher’s one-sided Pdetect = 0.1062) (Fig. 3a). However, F. nucleatum was relatively abundant within samples of IBD patients experiencing F. nucleatum (Wilcoxon test Pdetect = 0.02891) (Fig. 3b).
Even though the low detection frequency of F. nucleatum is not appropriate for early diagnosis of disease state, it would be a constructive approach of overcoming this constraint to examine the longitudinal metagenomes before and after detection of a certain species along with co-occurring species. Firstly, we tested whether the detection frequency and abundance were consistent in 44 duplicated samples that were sequenced in both Human microbiome project (HMP) and HMP pilot study individually. Microbial abundance and the detection frequency are positively correlated and the recovery rate is usually high for abundant species. As expected, highly abundant species were found in duplicates but less abundant ones with abundance below 0.01%, were not. About one-fourth of total species appeared only in one sample of a given duplicated pair. F. nucleatum was a relatively rare microbe observed only 4 times in three duplicates and its recovery rate was only 33.3% (Additional file 6: Figure S3). To overcome the limitation of snapshot-based approach, the samples collected from each subject over 1 year were arranged in chronological order relative to the detection point of F. nucleatum (Fig. 3c). The subjects were categorized into F. nucleatum-experienced or –innocent (non-experienced) groups, and the samples from F. nucleatum-experienced subjects were sub-divided into prior or posterior group as well as proximal or distal group to the detection point of F. nucleatum. The samples of F. nucleatum-experienced subjects were highly dispersed in PCoA plot (Fig. 3d, g). Experiencing F. nucleatum led to lowering Shannon diversity and Pielou’s evenness. Particularly, the samples either proximal or posterior to F. nucleatum detection exhibited decreased alpha diversity and increased human read fraction (Fig. 3e, h, i, Additional file 5: Figure S2e, f). Longitudinal tracking of F. nucleatum-experienced subjects revealed that the microbial diversity was decreased in non-IBD subjects as well as in IBD patients (Fig. 3f). These results imply that F. nucleatum might appear under gut microbiome perturbation toward a low microbial diversity.
Identification of biomarkers in IBD/non-IBD and their correlation with F. nucleatum
In order to clarify whether F. nucleatum was truly associated with inflammatory environment, we tried to screen biomarker species for IBD and non-IBD conditions. Using a non-parametric Linear discriminant analysis Effect Size (LEfSe) algorithm, 12 IBD- and 14 non-IBD-specific biomarkers were selected at the species level (Fig. 4a, Additional file 7: Table S4). As expected, these markers were differentially enriched in either IBD or non-IBD samples (Fig. 4b). The ratio of IBD markers to non-IBD markers was significantly increased in F. nucleatum-experienced subjects, suggesting that more IBD-specific biomarkers were associated with detection of F. nucleatum whether or not IBD was developed (Fig. 4c). The prevalence of IBD markers over non-IBD markers was also distinct in samples posterior to F. nucleatum-detection (Fig. 4d).
As shown in Fig. 4e, the number of IBD-specific biomarkers is an indicator for F. nucleatum occurrence at later time. The number of IBD-specific biomarkers in F. nucleatum-experienced subjects is significantly higher than that in F. nucleatum-innocent subjects under non-IBD condition (P < 2.2e-16). The number of IBD-specific biomarkers was increased and that of non-IBD-specific biomarkers was decreased at the detection point of F. nucleatum and afterwards under non-IBD condition, leading to an alteration of microbiome. Similarly, the number of non-IBD-specific biomarkers in F. nucleatum-experienced subjects is significantly lower than that in F. nucleatum-innocent subjects (P = 3e-15). The number of IBD-specific biomarkers was not much changed before and after the detection of F. nucleatum under non-IBD condition (Fig. 4e). These results suggested that experience of F. nucleatum should be tightly linked with IBD development.
The association of biomarker species with F. nucleatum was also assessed by calculating Spearman’s correlation coefficients. All 14 non-IBD biomarkers were negatively correlated with F. nucleatum, having very significant enrichment p-values, and IBD biomarkers showed mostly positive correlation with some exceptions (Fig. 4f). Collectively, the absolute correlation coefficient of a certain microbe with F. nucleatum had strong relationship with its enrichment p-values in either IBD or non-IBD conditions (ρ = 0.33, P = 3.8e-15; Fig. 4f).
When the longitudinal abundance of the biomarker species was examined, two representative marker species of non-IBD condition, Alistipes shahii and Alistipes putridinis, showed the decreasing patterns of abundance along the X-axis standing for the proximal weeks to the detection point of F. nucleatum. In contrast, the abundance of IBD markers like Clostridium symbiosum and Clostridium bolteae had opposite pattern, low at prior and high at posterior to F. nucleatum-detection along the temporal axis (Fig. 4g). The abundance of these four biomarkers was significantly changed only in IBD condition, which means that the perturbation in key microbes’ abundance should be accompanied by chronic inflammation. Besides, two additional IBD markers, Flavonifractor plautii and a unclassified species in Oscillibater genus, and three non-IBD markers, Alistipes finegoldii, Roseburia hominis, Roseburia inulinivorans, exhibited similar patterns of abundance changes over time (Additional file 8: Figure S4).
Microbial destabilization after F. nucleatum detection
Homeostasis of human gut microbiota is a sort of indicators of human health and understanding of their behavior is important for diagnosis and prevention of disease states. The microbial imbalance, called dysbiosis, is believed to cause or be associated with several metabolic and inflammatory diseases [37, 38]. To see whether F. nucleatum experience is associated with long-term stability of microbiome, we examined intra- and inter-individual alterations of microbiome in chronological order relative to the detection point of F. nucleatum.
Intra-individual dissimilarity of microbiome was measured by pairwise Bray-Curtis distance after random sampling in a given participant (Fig. 5a). Consistent with the previous findings, IBD subjects regardless of F. nucleatum-experience, showed higher microbial dissimilarity than non-IBD subjects at any given time intervals, supporting that IBD is related with microbial destabilization [34]. By calculating the microbial distance, the microbiomes of IBD patients who have experienced F. nucleatum were verified to be more unstable than those of F. nucleatum-nonexperienced group (Fig. 5b). The temporal microbial stability was compared between before and after detection of F. nucleatum (Fig. 5c). F. nucleatum-experienced subjects showed significant dissimilarity between earlier time and later time points in F. nucleatum-experience samples (P|x| < 20w = 3.5e-05), whereas F. nucleatum-innocent control did not (P|x| < 20w = 0.1905) (Fig. 5d). Individual alterations in microbiome were traced over the time, resulting that four IBD subjects (C3009, H4015, M2034, and P6009) among 16 F. nucleatum-experienced IBD subjects showed dramatic microbial shift but four F. nucleatum-experienced non-IBD subjects did not (Additional file 9: Figure S5).
Two different participants under the same condition were randomly selected to estimate inter-individual microbial distance (Fig. 5e). The dissimilarity between IBD patients was higher than non-IBD subjects (dIBD = 0.5696, dnon-IBD = 0.5000, P = 1.9e-07; Fig. 5f), and that of F. nucleatum-experienced subjects was also higher than non-experienced ones (dexp = 0.5927, dnon-exp = 0.5401, P = 7.5e-15; Fig. 5g). The microbial distance on the temporal distribution was higher in samples posterior than prior to F. nucleatum-detection (dposterior = 0.5816, dprior = 0.5372 P = 2.3e-07; Fig. 5h). When one F. nucleatum-detected sample was compared with samples of different F. nucleatum-experienced subjects, the inter-individual microbial distance was gradually elevated until 20 weeks after F. nucleatum detection (Fig. 5i, j).
Collectively, these results suggested that highly variable microbiome might be pre-established in F. nucleatum-colonizing environment, and potentiate dysbiosis upon chronic inflammation. On the other hand, a convergent microbiome before F. nucleatum detection become unstable and divergent along with F. nucleatum occurrence, possibly leading to the formation of pathogenic microbiome.
Identification of classifier microbes for F. nucleatum detection
To identify representative microbes for F. nucleatum detection, all 317 samples from F. nucleatum-experienced subjects (16 IBD and 4 non-IBD participants) were partitioned and 258 microbes were initially screened following the procedure described in Methods. Among them, 41 significant species were predicted as “classifiers” for F. nucleatum by multiple logistic regression analysis (False discovery rate (FDR) < 0.001) (Fig. 6a). These classifier microbes were divided into two groups, 15 and 26 species enriched in samples prior and posterior to F. nucleatum-detection, respectively (Fig. 6b, Additional file 10: Table S5). The posterior-enriched classifiers, including 3 IBD marker species, were favorably found in IBD samples, and the prior-enriched classifier with 4 non-IBD marker species were preferentially observed in non-IBD samples (Fig. 6b, c).
A recent fecal metagenome analysis suggested 29 core signature bacteria enriched in CRC metagenomes including three F. nucleatum strains [39]. Among them, 18 CRC signature species were also observed in our dataset, and most of them (14 out of 17 signatures except F. nucleatum) were positively correlated with F. nucleatum (Additional file 11: Table S6). The five CRC signature species including three Clostridium species (C. symbiosum, C. bolteae, C. clostridioforme), F. nucleatum, and Peptostreptococcus stomatis were overlapped with potent F. nucleatum-posterior classifiers (Area under the curve (AUC)C. sym. = 0.6574, AUCC. bolt. = 0.6427, AUCC. clostri. = 0.6102, AUCF. nuc. = 0.6043, AUCP. sto. = 0.5406, PCRC = 0.0164; Fig. 6b). Especially, C. symbiosum proposed as a potent fecal biomarker for CRC was the top F. nucleatum-posterior classifier in our study [40].
Considering discriminative property of microbial markers detected more than 5 times in F. nucleatum-experienced subjects, all 11 CRC biomarkers could successfully distinguish samples prior to F. nucleatum-detection from ones posterior to F. nucleatum-detection (PCRC = 0.0009). Likewise, a majority of IBD and non-IBD markers (9 out of 11 and 9 out of 14, respectively) showed a discriminative power (PIBD = 0.0229, Pnon-IBD = 0.0732) (Fig. 6d). Most biomarkers identified in this study exhibited significant discriminative power for F. nucleatum detection and were differentially enriched in samples either prior or posterior to F. nucleatum-detection, supporting that F. nucleatum-oriented approach has an advantage to the effective identification of biomarkers (Fig. 6e).
Estimation of F. nucleatum experience and dysbiosis level in F. nucleatum-innocent subjects
A prediction model was constructed to estimate the probability of experiencing F. nucleatum with top 13 potent classifiers satisfying average AUC > 0.6 and FDR < 1e-07 (Fig. 7a). The constructed generalized linear modeling (GLM) was tested with 100 randomly partitioned training datasets and the 10th GLM was chosen as the best model for examining the level of dysbiosis by considering average ranks in AUC, Akaike information criterion (AIC), accuracy, sensitivity, precision, and specificity (Fig. 7b, Additional file 12: Figure S6a-f). The 10 species used for building the 10th GLM were Dorea longicatena, Coprococcus comes, Lachnospiraceae bacterium 3_1_46FAA, Clostridium symbiosum, Roseburia hominis, Roseburia inulinivorans, Alistipes shahii, Bacteroides stercoris, Clostridium bolteae, and Veillonella parvula in descending order of mean AUC (Additional file 10: Table S5). When applying this model to F. nucleatum-experienced subjects for validation, the probability of experiencing F. nucleatum, so called “posterior probability”, was gradually increased and reached a decision threshold of 0.5 just before detection point of F. nucleatum, which means that this model can successfully predict the exact point of F. nucleatum detection (Fig. 7c). Strikingly, this model was still effective even when 86 F. nucleatum-innocent subjects were separated by predicted posterior probability and inflammation status. Samples with predicted posterior probability above 0.5 showed decreased alpha-diversity, increased number of biomarkers for IBD and CRC, and decreased number of non-IBD biomarkers indicating clear manifestations of dysbiosis (Fig. 7d, Additional file 12: Figure S6g). The posterior probability was correlated negatively with Shannon diversity and positively with the ratio of IBD to non-IBD markers (Spearman correlation, ρshannon = − 0.29, ρratio = 0.53; Fig. 7e). There was a negative correlation between microbial diversity and the posterior probability when examined in the most 12 “dynamic” subjects with high variance in posterior probability (Fig. 7f, Additional file 13: Figure S7). Especially, several IBD patients including E5009, H4015, H4032, H4044, P6009, P6010, and P6025, displayed dramatic microbial shift as the posterior probability increased. Additionally, the negative correlation could be further generalized to more subjects in 70th percentile from the highest variance in posterior probability (Additional file 14: Figure S8). The samples with low posterior probability were located in the lower left side of the plot but the samples with high probability were scattered, indicating that our prediction model explained microbial variance properly (Fig. 7g).
The validity of our prediction model was further strengthened through its application with the independent metagenomic data from HMP phase I database generated by analyzing fecal samples of healthy population [41]. The best GLM model above was applied to 82 filtered samples out of 251 samples, where no samples contained F. nucleatum as expected. Consistently with the previous results, healthy microbiome showed a broad range of F. nucleatum-posterior probability and the posterior probability was negatively correlated with Shannon diversity (Additional file 15: Figure S9a, b). The samples predicted as F. nucleatum-posterior or -prior group were examined and F. nucleatum-posterior group was typically characterized by decreases in three indices of microbial alpha diversity (richness, evenness, and Shannon diversity), increase in the prevalence of IBD and CRC biomarkers, and significant decrease of non-IBD biomarker (Additional file 15: Figure S9c). These results strongly supported that the 10 classifier species screened by their longitudinal dynamics to F. nucleatum could predict gut dysbiosis even in healthy individuals.
Application of potential biomarkers to the evaluation of fecal microbiome
To classify the microbial distribution, we considered 5 following criteria; 1) Spearman co-abundance correlation with F. nucleatum, 2) enrichment in IBD condition, 3) enrichment in F. nucleatum-experienced subjects, 4) enrichment in samples posterior to F. nucleatum detection, 5) discriminative significance for F. nucleatum detection. The biomarker species for IBD and non-IBD conditions were distinguishable in principal component analysis (PCA) plot, and the CRC signature species were closely related with IBD biomarkers (Fig. 8a, b).
The effectiveness of our IBD/non-IBD biomarkers as well as CRC markers in the longitudinal analysis was validated by K-means clustering of all microbes. Among 9 clusters, cluster 1 harbored most non-IBD biomarkers (8/14) and cluster 6 had five CRC and six IBD biomarkers, where C. symbiosum and C. bolteae belong to both sides. Moreover, cluster 6 held many known opportunists such as Clostridium difficile, Enterococcus faecalis, Enterococcus faecium, Escherichia coli, Haemophilus haemolyticus, Saccharomyces cerevisiae, and F. nucleatum (Additional file 16: Table S7). Cluster 4 had both CRC and non-IBD markers. Cluster 8 and 9 contained IBD markers (Fig. 8c). Notably, the cluster 1 and 6 were separated far apart in PCA plot, and the cluster 8 and 9 were localized near cluster 6, which indicated that the biomarker species with a similar character formed intimate clusters (Fig. 8d).
The number of detected microbes along temporal proximity to F. nucleatum was decreased in clusters 1, 4, and 7 where non-IBD biomarkers were involved (Fig. 8e). The number of detected microbes increased in IBD condition of clusters 6 and 8, which had both CRC and IBD biomarkers. Interestingly, although the cluster 2 and 3 showed significant decrease in detected microbe number regardless of inflammatory conditions, they did not contain any biomarkers. In accordance with Fig. 4e, the number of dysbiosis-associated biomarkers changed in IBD condition. Clusters 1, 4, and 7 were negatively correlated with the posterior probability, but the clusters 6 and 8 were positively related (Fig. 8f). Furthermore, the clusters 1 and 6 exhibited a complementary distribution each other in terms of microbial abundance and detection frequency, which was confirmed in independent healthy dataset (Fig. 8g, Additional file 15: Figure S9d).
Taken together, our work illuminated previously unrecognized knowledge on the early gut dysbiosis in the context of chronological dynamics of microbiome by focusing on the opportunistic colonization of F. nucleatum. It is noteworthy that even a rare microbial species under a certain condition could be used as an indicator for predicting a perturbation in the future event, as shown with F. nucleatum-focused longitudinal modeling. Although further experiments were needed to verify physiology of the classifier microbes, we expected that analysis on chronological alteration of microbiome would be greatly helpful for biomarker screening and diagnosis of microbiota-associated diseases.