Skip to main content

PE/PPE mutations in the transmission of Mycobacterium tuberculosis in China revealed by whole genome sequencing



This study aims to examine the impact of PE/PPE gene mutations on the transmission of Mycobacterium tuberculosis (M. tuberculosis) in China.


We collected the whole genome sequencing (WGS) data of 3202 M. tuberculosis isolates in China from 2007 to 2018 and investigated the clustering of strains from different lineages. To evaluate the potential role of PE/PPE gene mutations in the dissemination of the pathogen, we employed homoplastic analysis to detect homoplastic single nucleotide polymorphisms (SNPs) within these gene regions. Subsequently, logistic regression analysis was conducted to analyze the statistical association.


Based on nationwide M. tuberculosis WGS data, it has been observed that the majority of the M. tuberculosis burden in China is caused by lineage 2 strains, followed by lineage 4. Lineage 2 exhibited a higher number of transmission clusters, totaling 446 clusters, of which 77 were cross-regional clusters. Conversely, there were only 52 transmission clusters in lineage 4, of which 9 were cross-regional clusters. In the analysis of lineage 2 isolates, regression results showed that 4 specific gene mutations, PE4 (position 190,394; c.46G > A), PE_PGRS10 (839,194; c.744 A > G), PE16 (1,607,005; c.620T > G) and PE_PGRS44 (2,921,883; c.333 C > A), were significantly associated with the transmission of M. tuberculosis. Mutations of PE_PGRS10 (839,334; c.884 A > G), PE_PGRS11 (847,613; c.1455G > C), PE_PGRS47 (3,054,724; c.811 A > G) and PPE66 (4,189,930; c.303G > C) exhibited significant associations with the cross-regional clusters. A total of 13 mutation positions showed a positive correlation with clustering size, indicating a positive association. For lineage 4 strains, no mutations were found to enhance transmission, but 2 mutation sites were identified as risk factors for cross-regional clusters. These included PE_PGRS4 (338,100; c.974 A > G) and PPE13 (976,897; c.1307 A > C).


Our results indicate that some PE/PPE gene mutations can increase the risk of M. tuberculosis transmission, which might provide a basis for controlling the spread of tuberculosis.

Peer Review reports


Tuberculosis (TB) continues to be one of the most prevalent and deadly communicable disease that is a major global health challenge [1]. The Covid-19 pandemic has disrupted TB services which left many TB patients undiagnosed and untreated, leading to an increase in TB deaths and transmission [2, 3]. The evolution and spread of tuberculosis threaten to undermine the success of tuberculosis treatment and control programs [4]. In order to effectively combat TB, it is crucial to have a comprehensive understanding of TB’s transmission mechanisms.

One hallmark of the Mycobacterium tuberculosis (M. tuberculosis) genome is the presence of the multigenic PE/PPE family of proteins, accounting for about 10% of the coding region of the genome [5]. The standard H37Rv has 99 PE genes and 69 PPE genes, which characterized by conserved N-terminal prolineglutamate (PE) and proline-proline-glutamate (PPE) motifs [6]. Based on the high polymorphism of C-terminal amino acid sequence, the PE family can be divided into PE_PGRS (polymorphic GC-rich sequences) and PE (with no distinctive features) genes, and the PPE family includes PPE_MPTR (major polymorphic tandem repeats), PPE_SVP (with a GxxSVPxxW motif), PPE_PPW(with a PxxPxxW motif) and PPE genes with no distinctive features [5, 7,8,9]. The unique sequences of PE/PPE proteins might underlie their specific physiological roles during M. tuberculosis infection.

Many PE/PPE proteins have been shown to play an important role in antigenicity, immune-modulation and virulence in M. tuberculosis [10,11,12]. For example, PPE68 and PE35 were identified as required for M. tuberculosis virulence [13, 14]. Cell necrosis is associated with the spread and virulence of M. tuberculosis because it leads to the release and dissemination of the tuberculin pathogen [15]. This function has been reported for PE_PGRS33 [16]and PPE27 [17]. PPE39 [18], PE_PGRS5 [19] and PE_PGRS17 [20] were shown to play roles in host cell interaction and immune regulation. Alternatively, the highly cellular immune response suggests that some PE/PPE proteins may be better diagnostic and vaccine candidates [21].

Some genes were found to contribute to enhanced transmission of M. tuberculosis, such as mutation in the ESX-5 type VII secreted protein EsxW [22] and mutation in the lldD2 promoter [23]. The role of PE/PPE genes in transmission, albeit less studied, has shown that mutations in PPE54, for instance, contribute to the enhanced spread of the disease in Malawi [24], highlighting their potential significance in the epidemiological dynamics. Here, we have used whole genome sequence data from 3202 Chinese isolates and detected homoplastic single nucleotide polymorphisms (SNPs) in PE/PPE gene region to assess the impact of these homoplastic SNPs on the spread of M. tuberculosis. The result may provide new insights into the impact of the PE/PPE gene mutations on the spread of M. tuberculosis in China.


Sample collection

The M. tuberculosis strains analyzed in this study included two sets of samples (Supplementary Table 1). (1) Newly sequenced whole genome dataset. We included 1,550 culture-confirmed TB samples with drug susceptibility test (DST) results reported to Shandong Tuberculosis Surveillance System during 2013–2017, and the genome sequence data were deposited in the National Center for Biotechnology Information (NCBI) BioProject database (Accession number is PRJNA1002108). (2) Countrywide collection of publicly available clinical isolates of M. tuberculosis. We downloaded the WGS data of 1,755 isolates from the European Nucleotide Archive repository. The genomes were sampled from 2007 to 2018, and the geographic distribution covered 30 of the 34 provincial regions of China. Individual patient identifiers were removed before data analysis and reporting. An informed consent waiver and ethical approval were obtained from the Ethics Committee of Shandong Provincial Hospital affiliated to Shandong First Medical University.

Whole-genome sequencing

The genome of the 1468 Shandong isolates was sequenced using Illumina HiSeq 4000. Quality assessment of all acquired reads was performed with FastQC v.0.11.9 (version 0.11.7), and 1447 samples passed quality control [25]. Low-quality raw reads from paired-end sequencing were discarded. Reads were then aligned to the H37Rv (NC_000962) reference genome using BWA-MEM (version 0.7.17-r1188) [26]. Duplicate reads and clipped alignments were removed with Sam tools markdup (version 1.15) and Samclip (version 0.4.0) [27, 28], and only samples with a coverage rate of 98% or higher and a minimum depth of at least 20 were included. The filtered vcf file was annotated with snpEFF (version 4.3t) to obtain the final sample single nucleotide polymorphisms (SNPs) [29].

Homoplastic SNPs identifcation

We used snippy-core (version 4.6.0) to obtain SNPs form entire 3202 isolates. To assign lineages, we analyzed M. tuberculosis WGS data using TBProfiler (version 4.3.0) [30, 31]. The IQ-TREE (v1.6.12) model “JC + I + G4” used 1000 ultrafast bootstrap replicates and treetime (v0.9.0) to construct and date maximum-likelihood (ML) phylogenetic tree [32]. The occurrence of homoplastic mutations can be attributed to convergent evolution driven by selective pressures. Homoplasy analysis was performed employing the HomoplasyFinder software, in adherence to established protocols and methodologies. The SNPs were considered as homoplastic if they occurred independently within different transmission clades and did not form a monophyletic clade based on the provided phylogenetic tree and nucleotide alignment consistency index [33, 34]. Homoplastic SNPs located in the PE/PPE gene region, with a minor allele frequency (MAF) > 0.005 were included for further analysis.

Transmissibility analysis

Clusters were defined as strains with a genetic distance of 10 SNPs or fewer, indicating recent transmission [22]. These clusters were categorized as cross-regional or regional, depending on whether they included strains from two or more of China’s seven geographic regions (Northwest, Northern, Northeast, Central, Eastern, Southern, and Southwest China) [35]. Furthermore, transmission clusters were subdivided into small clusters (2 isolates), medium-sized clusters (3–6 isolates), and large clusters (more than 6 isolates) based on the number of isolates within each cluster [36, 37].

Statistical analysis

To streamline the analysis, homoplastic SNPs with a MAF < 0.005 (approximately 15.

isolates) in the PE/PPE gene region were excluded from the analysis. Between clustered and non-clustered, as well as cross-regional and regional clusters, we employed univariate regression analysis and included sites with P values < 0.2 in the subsequent multivariate regression analysis [38]. To analyze the effect of PE/PPE gene mutations on cluster size, we conducted a Spearman’s rank correlation analysis. The statistical analysis was performed using IBM SPSS Statistics (version 26.0). All reported statistical tests were two-sided, and P values < 0.05 were considered statistically significant.


Sample structure

Of all domestic 3202 strains, 2745 isolates (85.7%) belonged to lineage 2 (94.4% belonged to sublineage 2.2.1), 443 (13.8%) isolates belonged to lineage 4, only 14 isolates belonged to other lineages (lineage 1 and lineage 3). We constructed maximum likelihood phylogenetic trees for lineage 2 and lineage 4 M. tuberculosis isolates, respectively (Fig. 1a and b). The results of strain clustering showed that 1462 strains in lineage 2 were grouped into 446 transmission groups, which were consisted of 2 to 107 isolates. In lineage 4, a total of 52 clusters contained 132 isolates, ranging in size from 2 to 9 isolates (Table 1). It should be noted that lineage 2 strains had a considerably larger proportion of strains in transmission clusters than lineage 4 strains (53.3% vs. 29.8%, P < 0.001).

Fig. 1
figure 1

(a) Phylogenetic tree of 2745 Chinese M. tuberculosis strains in lineage 2. (b) Phylogenetic tree of 443 Chinese M. tuberculosis strains in lineage 4

Table 1 Lineage and demographic factors associated with transmission clusters (≤ 10 SNP) of M. tuberculosis strains in China

The effect of PE/PPE gene mutations on transmission of L2 strains

A comprehensive analysis revealed 1,141 homoplastic SNPs in lineage 2 strains, as detailed in Supplementary Table 2. After excluding SNPs with a MAF below 0.005, a total of 140 homoplastic SNPs from the PE/PPE gene region were selected for further analysis. Comparing clustered and non-clustered strains, 45 mutation sites with statistical significance (P < 0.05) in the univariate regression analysis. To further investigate these associations, the 59 loci with P values less than 0.2 in the univariate analysis were included in a multivariate logistic regression analysis. The results indicated that 9 sites were identified as influencing factors (P < 0.05), with PE4 (position 190,394; c.46G > A; OR, 2.183; 95% CI, 1.025–4.651), PE_PGRS10 (839,194; c.744 A > G; OR, 1.668; 95% CI, 1.220–2.280), PE16 (1,607,005; c.620T > G; OR, 3.741; 95% CI, 2.039–6.864) and PE_PGRS44 (2,921,883; c.333 C > A; OR, 12.664; 95% CI, 1.696–94.357) considered as risk factors for strain clustering (Table 2).

Table 2 Analysis of the PE/PPE gene mutations in clustering and non-clustering of lineage 2

The 382 strains belonging to lineage 2 formed 77 cross-regional clusters, ranging from 2 to 6 geographic regions. Among the 7 geographic regions, Northern China (31.9%) and Central China (27.3%) exhibited the highest proportion of these cross-regional clusters, followed by and Southwest China (23.1%) and Northwest China (19.5%). In the univariate analysis, 57 SNPs exhibited statistically significant differences between cross-regional and regional clusters (P < 0.05). Subsequent multivariate logistic regression analysis identified 10 mutations as influencing factors (P < 0.05), with 4 mutation positions recognized as risk factors for cross-regional clusters, including PE_PGRS10 (839,334; c.884 A > G; OR, 2.706; 95% CI, 1.081–6.774), PE_PGRS11 (847,613; c.1455G > C; OR, 4.342; 95% CI, 1.636–11.525), PE_PGRS47 (3,054,724; c.811 A > G; OR, 2.099; 95% CI, 1.211–3.637) and PPE66 (4,189,930; c.303G > C; OR, 6.511; 95% CI, (1.679–25.242) (Supplementary Table 3).

The correlation analysis between mutation sites and cluster size revealed that 19 mutation positions were significantly associated with cluster size (P < 0.05), with 13 mutation positions positively correlated with clustering size (rs > 0), including PE_PGRS1 (132,417), PE_PGRS6 (623,472), PE_PGRS9 (836,658), PE16 (1,607,005), PPE26 (2,027,484), PPE34 (2,165,286), PPE35 (2,167,926), PPE44 (3,079,877), PPE54 (3,736,628), PPE56 (3,762,013), PE_PGRS58 (4,032,218), PE_PGRS58 (4,032,760) and PPE69 (4,375,628). For further details refer to Fig. 2.

Fig. 2
figure 2

Correlation analysis of PE/PPE gene mutation positions and cluster size

The effect of PE/PPE gene mutations on transmission of L4 strains

A total of 205 homoplastic SNPs were detected in lineage 4 strains, as presented in Supplementary Table 4 for reference. After excluding homoplastic SNPs with a MAF below 0.005, 74 SNPs in the PE/PPE gene region were selected for in-depth examination. A significant difference in 6 PE/PPE gene mutation positions was detected between clustered and non-clustered strains, as per the single-factor analysis (P < 0.05). To further investigate these associations, a multivariate logistic regression analysis was conducted, focusing on the 22 loci with P-values less than 0.2 from the initial univariate analysis. However, no mutations were observed in lineage 4 strains that seemed to facilitate transmission, as displayed in Table 3.

Table 3 Analysis of the PE/PPE gene mutations in clustering and non-clustering of lineage 4

Furthermore, 25 lineage 4 strains grouped into 9 cross-regional clusters, with strains in each cluster spanning two different geographic regions. After conducting univariate analysis, 9 mutation sites were selected for a multivariate regression analysis, revealing that 4 positions were significantly associated with cross-regional clusters (P < 0.05). PE_PGRS4 (338,100; c.974 A > G; OR, 6.090; 95% CI, 1.702–21.793) and PPE13 (976,897; c.1307 A > C; OR, 3.505; 95% CI,1.103–11.132) considered as risk factors for cross-regional transmission of strains (Supplementary Table 5). Due to the low prevalence of lineage 4 strains in China and the relatively small sample size, we did not further analyze the transmission cluster size of lineage 4 strains.


We analyzed 3202 domestic isolates (including the 1447 Shandong isolates and 1755 publicly available isolates) in this study. The genomic analysis of M. tuberculosis across the country reveals that lineage 2 is the predominant strain, contributing significantly to the tuberculosis burden in China, followed by lineage 4. Moreover, lineage 2 strains had a considerably larger proportion of strains in transmission clusters than lineage 4 strains. In subsequent analyses, we identified 4 PE/PPE gene mutations associated with the spread of lineage 2 strains, but no mutations were found to enhance the transmission of lineage 4 strains.

Homoplastic mutations are mutations independently occurring in different clades of an organism. Homoplastic changes may be a result of convergence evolution due to selective pressures [39]. Previous reports have shown that homoplastic SNPs were present in all functional categories of genes, with PE/PPE gene family having the highest ratio of homoplastic SNPs compared to the total SNPs identified in the same functional category [34], but the relationship between these SNPs within the PE/PPE gene region and strain transmission has not been described. In our study, the results of homoplasy analysis of strains showed that 1,141 homoplastic SNPs were identified in strains of lineage 2 and 78 homoplastic SNPs were confirmed in strains of lineage 4, respectively.

The PE/PPE genes are especially abundant in pathogenic mycobacteria, suggesting that they play a major role in mycobacterial survival and pathogenesis, although the precise function of these proteins is largely unknown [6, 40]. Based on our findings, we observed a missense mutation (c.46G > A, p.Ala16Thr) at position 190,394 of PE4.

(Rv0160c), and another missense mutation (c.620T > G, p.Leu207Arg) at position 1,607,005 of PE16 (Rv1430), which have been associated with increased risk of transmission of M. tuberculosis within lineage 2. Despite the lingering uncertainties surrounding the exact function of PE4, recent study has shed light on its role in enhancing mycobacterial survival within macrophages [41]. PE16, a member of the serine hydrolase superfamily with esterase activity, is particularly notable for its ability to hydrolyze short- to medium-chain fatty acid esters [42]. Both PE4 and PE16 play pivotal roles in the progression of mycobacterial infections. To fully comprehend the mechanisms through which these mutations facilitate transmission, further in-depth research is imperative. A synonymous mutation at position 839,194 of PE_PGRS10 (Rv0747) (c.744 A > G, p.Thr248Thr) has been found to affect the transmission of lineage 2 isolates. Mazandu and Mulder [43] have predicted that PE_PGRS10 is involved in lipid metabolism, although this has yet to be experimentally verified, an important feature of mycobacterial pathogenicity. This synonym mutation can affect the spread of M. tuberculosis, indicating that the synonym mutations of PE/PPE genes are not all neutral mutations, which is consistent with the previous research results that the synonym mutations of yeast genes are mostly strong non-neutral mutations [44].

The protein functions of PE_PGRS are unique to mycobacteria and are secreted or cell surface associated [45]. This suggests that they could be involved in mediating the interaction between the macrophages and the bacteria [46,47,48]. PE_PGRS proteins were translocated through the plasmatic membrane in an ESX5-dependent mechanism [49, 50], once in the outer membrane PE_PGRS proteins may “float” on the micromembrane outer leaflet and may possibly be released to exert their activity. The cell wall-anchored/secretory protein PE_PGRS 11 (Rv0754) exhibits a significant interaction with TLR2, driving the maturation and activation of human dendritic cells (DCs), thereby enhancing their capacity to stimulate CD4 + T cells [20]. Analysis of the effects of deletion or over-expression of PE_PGRS47 (Rv2741) implicated this protein in the inhibition of autophagy in infected host phagocytes. As a functionally significant and non-redundant bacterial component, PE_PGRS47 contributes to the modulation of both innate and adaptive immune responses. This finding implicates PE_PGRS47 as a potential target for enhancing antigen presentation and fostering protective immunity during vaccination or infection [51]. A missense mutation (c.1455G > C, p.Glu485Asp) at position 847,613 of PE_PGRS11, and another missense mutation (c.811 A > G, p.Ser271Gly) at position 3,054,724 of PE_PGRS47 have been found to promote the cross-regional spread of lineage 2 isolates.

A previous study of the influence of genomic variants on M. tuberculosis transmission in Malawi showed significant convergent evolution in a mutation in PPE54, which associated with transmission [24]. Notably, 68.7% of the strains in this study belonged to lineage 4, and only 3.8% belonged to lineage 2. However, in our study, we did not find that the mutation of PE54 could promote the spread of lineage 4 strains in China. One possible explanation for this could be the bacteriological diversity of the lineage 4 sublineages, which show complex population structure with 21 sublineages and 15 internal groups [35, 52]. Another possible reason for this difference is the limited sample size of lineage 4, which might have been inadequate to identify less common lineage 4 strains or associated mutations. A larger dataset might potentially yield more precise findings, suggesting that the current sample size might have introduced a degree of inaccuracy.

Our study is the first to explore the association of PE/PPE gene mutations with M. tuberculosis transmission in China. Nevertheless, we could illustrate some limitations. The distribution of M. tuberculosis isolates in the current dataset may not accurately reflect the prevalence of TB in some areas, but our data have a large sample size and broad geographical distribution which allowed us to analyze how M. tuberculosis spread across the whole country. Additionally, due to the low prevalence of lineage 4 in China and the relatively small sample size, we did not subdivide the transmission cluster sizes of lineage 4 strains. The higher sensitivity of lineage specific analysis may require a larger sample size to be achieved.


In summary, our work highlights two main M. tuberculosis lineages of transmission in China, as well as some PE/PPE gene mutations can increase the risk of MTB transmission in lineages 2 and 4, respectively, providing valuable insights for the treatment of TB. We believe that the PE/PPE family will remain a highly active area of research with various exciting features yet to be discovered.

Data availability

The newly sequenced whole genome dataset of 1,447 M. tuberculosis strains was deposited in the NCBI Bio Project (accession number is PRJNA1002108), and 1755 other isolates were downloaded from the European Nucleotide Archive repository. (accession numbers are provided in Supplementary Table 1). Any additional data are available from the corresponding authors upon reasonable request.


  1. Global Tuberculosis Report. 2021.

  2. GLOBAL Tuberculosis Report. 2022.

  3. Chakaya J, Petersen E, Nantanda R, et al. The WHO Global Tuberculosis 2021 Report - not so good news and turning the tide back to end TB. Int J Infect Dis. 2022;124(Suppl 1):S26–9.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Farhat MR, Shapiro BJ, Kieser KJ, et al. Genomic analysis identifies targets of convergent positive selection in drug-resistant Mycobacterium tuberculosis. Nat Genet. 2013;45(10):1183–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Cole ST, Brosch R, Parkhill J, et al. Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature. 1998;393(6685):537–44.

    Article  CAS  PubMed  Google Scholar 

  6. Fishbein S, van Wyk N, Warren RM, Sampson SL. Phylogeny to function: PE/PPE protein evolution and impact on Mycobacterium tuberculosis pathogenicity. Mol Microbiol. 2015;96(5):901–16.

    Article  CAS  PubMed  Google Scholar 

  7. Adindla S, Guruprasad L. Sequence analysis corresponding to the PPE and PE proteins in Mycobacterium tuberculosis and other genomes. J Biosci. 2003;28(2):169–79.

    Article  CAS  PubMed  Google Scholar 

  8. Tundup S, Akhter Y, Thiagarajan D, Hasnain SE. Clusters of PE and PPE genes of Mycobacterium tuberculosis are organized in operons: evidence that PE Rv2431c is co-transcribed with PPE Rv2430c and their gene products interact with each other. FEBS Lett. 2006;580(5):1285–93.

    Article  CAS  PubMed  Google Scholar 

  9. Brennan MJ, Delogu G. The PE multigene family: a ‘molecular mantra’ for mycobacteria. Trends Microbiol. 2002;10(5):246–9.

    Article  CAS  PubMed  Google Scholar 

  10. Mohareer K, Tundup S, Hasnain SE. Transcriptional regulation of Mycobacterium tuberculosis PE/PPE genes: a molecular switch to virulence? J Mol Microbiol Biotechnol. 2011;21(3–4):97–109.

    Article  CAS  PubMed  Google Scholar 

  11. Kohli S, Singh Y, Sharma K, Mittal A, Ehtesham NZ, Hasnain SE. Comparative genomic and proteomic analyses of PE/PPE multigene family of Mycobacterium tuberculosis H37Rv and H37Ra reveal novel and interesting differences with implications in virulence. Nucleic Acids Res. 2012;40(15):7113–22.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Mishra KC, de Chastellier C, Narayana Y, et al. Functional role of the PE domain and immunogenicity of the Mycobacterium tuberculosis triacylglycerol hydrolase LipY. Infect Immun. 2008;76(1):127–40.

    Article  CAS  PubMed  Google Scholar 

  13. Sassetti CM, Rubin EJ. Genetic requirements for mycobacterial survival during infection. Proc Natl Acad Sci U S A. 2003;100(22):12989–94.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Jiang Y, Wei J, Liu H, et al. Polymorphisms in the PE35 and PPE68 antigens in Mycobacterium tuberculosis strains may affect strain virulence and reflect ongoing immune evasion. Mol Med Rep. 2016;13(1):947–54.

    Article  CAS  PubMed  Google Scholar 

  15. Behar SM, Divangahi M, Remold HG. Evasion of innate immunity by Mycobacterium tuberculosis: is death an exit strategy? Nat Rev Microbiol. 2010;8(9):668–74.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Dheenadhayalan V, Delogu G, Brennan MJ. Expression of the PE_PGRS 33 protein in Mycobacterium smegmatis triggers necrosis in macrophages and enhanced mycobacterial survival. Microbes Infect. 2006;8(1):262–72.

    Article  CAS  PubMed  Google Scholar 

  17. Yang G, Luo T, Sun C, et al. PPE27 in Mycobacterium smegmatis enhances mycobacterial survival and manipulates cytokine secretion in mouse macrophages. J Interferon Cytokine Res. 2017;37(9):421–31.

    Article  CAS  PubMed  Google Scholar 

  18. Choi HH, Kwon KW, Han SJ, et al. PPE39 of the Mycobacterium tuberculosis strain Beijing/K induces Th1-cell polarization through dendritic cell maturation. J Cell Sci. 2019;132(17):jcs228700.

    Article  CAS  PubMed  Google Scholar 

  19. Grover S, Sharma T, Singh Y, et al. The PGRS Domain of Mycobacterium tuberculosis PE_PGRS protein Rv0297 is involved in endoplasmic reticulum stress-mediated apoptosis through toll-like receptor 4. mBio. 2018;9(3):e01017–18.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Bansal K, Elluru SR, Narayana Y, et al. PE_PGRS antigens of Mycobacterium tuberculosis induce maturation and activation of human dendritic cells. J Immunol. 2010;184(7):3495–504.

    Article  CAS  PubMed  Google Scholar 

  21. Vordermeier HM, Hewinson RG, Wilkinson RJ, et al. Conserved immune recognition hierarchy of mycobacterial PE/PPE proteins during infection in natural hosts. PLoS ONE. 2012;7(8):e40890.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Holt KE, McAdam P, Thai PVK, et al. Frequent transmission of the Mycobacterium tuberculosis Beijing lineage and positive selection for the EsxW Beijing variant in Vietnam. Nat Genet. 2018;50(6):849–56.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Brynildsrud OB, Pepperell CS, Suffys P, et al. Global expansion of Mycobacterium tuberculosis lineage 4 shaped by colonial migration and local adaptation. Sci Adv. 2018;4(10):eaat5869.

    Article  PubMed  PubMed Central  Google Scholar 

  24. Sobkowiak B, Banda L, Mzembe T, Crampin AC, Glynn JR, Clark TG. Bayesian reconstruction of Mycobacterium tuberculosis transmission networks in a high incidence area over two decades in Malawi reveals associated risk factors and genomic variants. Microb Genom. 2020;6(4):e000361.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Andrews S. FastQC A quality control tool for high throughput sequence data. Cambridge (UK): Babraham Bioinformatics; 2010.

    Google Scholar 

  26. Jung Y, Han D, BWA-MEME:. BWA-MEM emulated with a machine learning approach. Bioinformatics. 2022;38(9):2404–13.

    Article  CAS  PubMed  Google Scholar 

  27. Li H, Handsaker B, Wysoker A, et al. The sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25(16):2078–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Danecek P, Bonfield JK, Liddle J et al. Twelve years of SAM tools and BCF tools. Gigascience. 2021, 10(2).

  29. Cingolani P, Platts A, Wang le. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin). 2012;6(2):80–92.

    Article  CAS  PubMed  Google Scholar 

  30. Phelan JE, O’Sullivan DM, Machado D, et al. Integrating informatics tools and portable sequencing technology for rapid detection of resistance to anti-tuberculous drugs. Genome Med. 2019;11(1):41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Coll F, McNerney R, Guerra-Assunção JA, et al. A robust SNP barcode for typing Mycobacterium tuberculosis complex strains. Nat Commun. 2014;5:4812.

    Article  CAS  PubMed  Google Scholar 

  32. Nguyen LT, Schmidt HA, von Haeseler A, Minh BQ. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol Biol Evol. 2015;32(1):268–74.

    Article  CAS  PubMed  Google Scholar 

  33. Ruesen C, Chaidir L, van Laarhoven A, et al. Large-scale genomic analysis shows association between homoplastic genetic variation in Mycobacterium tuberculosis genes and meningeal or pulmonary tuberculosis. BMC Genomics. 2018;19(1):122.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Tantivitayakul P, Ruangchai W, Juthayothin T, et al. Homoplastic single nucleotide polymorphisms contributed to phenotypic diversity in Mycobacterium tuberculosis. Sci Rep. 2020;10(1):8024.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Li YF, Yang Y, Kong XL, et al. Transmission dynamics and phylogeography of Mycobacterium tuberculosis in China based on whole-genome phylogenetic analysis. Int J Infect Dis. 2024;140:124–31.

    Article  CAS  PubMed  Google Scholar 

  36. Chiner-Oms Á, Sánchez-Busó L, Corander J, et al. Genomic determinants of speciation and spread of the Mycobacterium tuberculosis complex. Sci Adv. 2019;5(6):eaaw3307.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Farhat MR, Freschi L, Calderon R, Ioerger T, Snyder M, Meehan CJ, et al. GWAS for quantitative resistance phenotypes in Mycobacterium tuberculosis reveals resistance genes and regulatory regions. Nat Commun. 2019;10:2128.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Schrag A, Siddiqui UF, Anastasiou Z, Weintraub D, Schott JM. Clinical variables and biomarkers in prediction of cognitive impairment in patients with newly diagnosed Parkinson’s disease: a cohort study. Lancet Neurol. 2017;16(1):66–75.

    Article  CAS  PubMed  Google Scholar 

  39. Grandjean L, Gilman RH, Iwamoto T, et al. Convergent evolution and topologically disruptive polymorphisms among multidrug-resistant tuberculosis in Peru. PLoS ONE. 2017;12(12):e0189838.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Daleke MH, Cascioferro A, de Punder K, et al. Conserved Pro-glu (PE) and Pro-pro-glu (PPE) protein domains target LipY lipases of pathogenic mycobacteria to the cell surface via the ESX-5 pathway. J Biol Chem. 2011;286(21):19024–34.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Singh SK, Tripathi DK, Singh PK, Sharma S, Srivastava KK. Protective and survival efficacies of Rv0160c protein in murine model of Mycobacterium tuberculosis. Appl Microbiol Biotechnol. 2013;97(13):5825–37.

    Article  CAS  PubMed  Google Scholar 

  42. Sultana R, Vemula MH, Banerjee S, Guruprasad L. The PE16 (Rv1430) of Mycobacterium tuberculosis is an esterase belonging to serine hydrolase superfamily of proteins. PLoS ONE. 2013;8(2):e55320.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Mazandu GK, Mulder NJ. Function prediction and analysis of mycobacterium tuberculosis hypothetical proteins. Int J Mol Sci. 2012;13(6):7283–302.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Shen X, Song S, Li C, Zhang J. Synonymous mutations in representative yeast genes are mostly strongly non-neutral. Nature. 2022;606(7915):725–31.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Tian C, Jian-Ping X. Roles of PE_PGRS family in Mycobacterium tuberculosis pathogenesis and novel measures against tuberculosis. Microb Pathog. 2010;49(6):311–4.

    Article  CAS  PubMed  Google Scholar 

  46. Banu S, Honoré N, Saint-Joanis B, Philpott D, Prévost MC, Cole ST. Are the PE-PGRS proteins of Mycobacterium tuberculosis variable surface antigens? Mol Microbiol. 2002;44(1):9–19.

    Article  CAS  PubMed  Google Scholar 

  47. Espitia C, Laclette JP, Mondragón-Palomino M, et al. The PE-PGRS glycine-rich proteins of Mycobacterium tuberculosis: a new family of fibronectin-binding proteins? Microbiol (Reading). 1999;145(Pt 12):3487–95.

    Article  CAS  Google Scholar 

  48. Brennan MJ, Delogu G, Chen Y, et al. Evidence that mycobacterial PE_PGRS proteins are cell surface constituents that influence interactions with other cells. Infect Immun. 2001;69(12):7326–33.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Bottai D, Di Luca M, Majlessi L, et al. Disruption of the ESX-5 system of Mycobacterium tuberculosis causes loss of PPE protein secretion, reduction of cell wall integrity and strong attenuation. Mol Microbiol. 2012;83(6):1195–209.

    Article  CAS  PubMed  Google Scholar 

  50. Burggraaf MJ, Speer A, Meijers AS, et al. Type VII secretion substrates of pathogenic mycobacteria are processed by a surface protease. mBio. 2019;10(5):e01951–19.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Saini NK, Baena A, Ng TW, et al. Suppression of autophagy and antigen presentation by Mycobacterium tuberculosis PE_PGRS47. Nat Microbiol. 2016;1(9):16133.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Freschi L, Vargas R Jr, Husain A, et al. Population structure, biogeography and transmissibility of Mycobacterium tuberculosis. Nat Commun. 2021;12(1):6099.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


This work was supported by the Department of Science & Technology of Shandong Province (CN) (Nos. 2007GG30002033 and 2017GSF218052), Natural Science Foundation of Shandong Province (CN) (No. ZR2020KH013 and ZR2021MH006), and Jinan Science and Technology Bureau (CN) (No. 201704100). The funders had no role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Author information

Authors and Affiliations



FWW collected the data and drafted the manuscript. XLK analyzed the data. YJY, HJJ, TNN, LYM, WTT, LYY, HQL and ZYZ commented and revised the manuscript. HCL and YL conceptualized, designed the study, and acquired funding for the present study. All authors approved publication of the manuscript.

Corresponding author

Correspondence to Yao Liu.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, Ww., Kong, Xl., Yang, Jy. et al. PE/PPE mutations in the transmission of Mycobacterium tuberculosis in China revealed by whole genome sequencing. BMC Microbiol 24, 206 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: