Although gyrA, gyrB, hsp65, recA, rpoB, and sodA genes are appropriate for identification purposes [3, 4], our results emphasized that these genes seem inappropriate for specific detection of mycobacteria. Indeed, their high similarities with non-mycobacterial genes make specific target design delicate. These new results are in accordance with our previous observations that the molecular targets which were designed based on gyrB , rpoB or hsp65 genes, had low specificity . For example, the non-related Helicobacter pylori show positive amplification with several Mycobacterium specific primer pairs . Prospection for more specific targets in mycobacterial genomes seems consequently necessary in order to improve current detection tools based on proteins and/or DNA. The new atpE real-time PCR method that we propose is just as specific, but more sensitive than the previously proposed rrs real-time PCR method which cannot detect some mycobacterial species .
The proposed strategy is aimed at comparing mycobacterial and non-mycobacterial genomic proteins to reference genomic DNA of M. tuberculosis H37Rv, sorting proteins according to similarity requests and listing candidate proteins (Figure 1). We chose to perform protein-level comparisons in order to identify exclusively conserved proteins in Mycobacterium spp. because non-coding regions, as intergenic regions and insertion sequences, are known to be less conserved than coding regions in M. tuberculosis genomes . According to literature, our results emphasized that almost half of the M. tuberculosis H37Rv predicted proteins are potentially present in the genomes of CNM group members. More precisely, mycobacteria belong to Actinobacteria which may explain the presence of 48 to 73% shared genes among high G + C content microorganisms [31–34]. In addition, horizontal gene transfers from different bacteria widely present in soil or water, especially Rhodococcus sp., Nocardia sp. and Streptomyces sp. were previously considered to have happened in the Mycobacterium genus which may also explain the shared proteins with non-mycobacterial species [24, 27, 35]. These observations show that CNM group members must be taken into account in order to develop highly specific mycobacterial targets, considering that these bacteria are commonly found in aquatic and terrestrial environments [36, 37].
Our study showed that 11 proteins exclusively conserved in the 16 mycobacterial genomes studied could be selected using our genome comparison strategy (i.e. proteins coded by atpE, atpB, cmaA1, lppM, PE5, PPE48, esxG, esxH and esxR genes, as well as an oxidoreductase and a small secreted protein). Only the aptE gene could be used to design primers and a probe for mycobacteria detection. Concerning the other genes, the sequence polymorphism among NTM species did not allow designing molecular targets for Mycobacterium spp. detection. However, these genes could be of immunological or pathogenic importance. Indeed, PE and PPE family proteins represent 0.9 to 4.2% of the genome coding capacity of several mycobacteria [22, 25, 26, 35], and are suspected to play a major antigenic role in immune response . PE and PPE family proteins are often associated with mycobacterial esx gene clusters, which encode ATP dependent specific secretion system  and are required to export specific members of the 6-kDa early secreted antigenic target (ESAT-6) protein family . Together, ATP dependent specific secretion system and ESAT-6 protein family play a major role in the virulence and life cycle of mycobacteria [24, 26]. Nevertheless, PE and PPE family proteins, and proteins coded by esx gene clusters are very small and polymorphous among genomes of the 11 NTM species compared (Table 1). Mycobacterial cell wall is also important in pathology, and could procure interesting PCR targets. For instance, several studies emphasized that cyclopropanation of the mycolic acids is common among pathogenic mycobacteria but rare among saprophytic species . Although having sufficient length, proteins CMAS coded by the cmaA1 gene and lipoprotein coded by lppM gene in M. tuberculosis H37Rv, were also polymorphous among genomes of the 11 NTM species compared (Table 1) and thus could not be used to design a primer pair and a probe (Additional file 2). Nevertheless, polymorphism of mycobacterial mycolic acids is useful for mycobacteria identification [40, 41].
The atpE gene which codes ATP synthase subunit C in M. tuberculosis H37Rv genome (locus Rv1305) is exclusively conserved in the genomes of the 17 mycobacterial species studied (Additional file 2), and its length and relative conservation among mycobacteria make it an adequate molecular target in order to detect Mycobacterium genus. It is remarkable to see that the protein coded by atpE gene was also the target of the new antimycobacterial compound recently described: diarylquinoline R207910 . This compound shows a specific bactericidal effect on mycobacteria and none in other genera . In addition, our in vitro results demonstrated the specificity of the atpE gene (locus Rv1305), which codes for the ATP synthase protein subunit C. These results also showed that our strategy of target design based on MycoHit software (Figure 1) gave very useful results for designing highly specific primers and might be applied to other microorganism clusters.
In vitro validation of the real-time PCR targeting the atpE gene showed a very high specificity and sensitivity, as well as reproducible quantification of different mycobacteria species. The new real-time method was tested on a realistic number of mycobacterial species including several slow and rapid growing NTM, although not all the described mycobacterial species were tested. In addition, application of this real-time PCR method to environmental samples showed that Mycobacterium was detected in tap water samples. The discrepancy between the cultural and molecular techniques was previously described for other pathogens, and the lower level of prevalence obtained by the PCR methods was probably due to our concentration and extraction procedures. These protocol steps must be improved to detect low level of NTM even if the used spin column seemed more appropriate for DNA extraction from environmental samples compared to classical phenol-chloroform extraction. Moreover, culture method did not detect higher level of mycobacterial cells compared to the molecular one. Both methods have advantages and drawbacks, and it may explain the differences observed. For instance, molecular methods could detect dead bacteria, or viable but uncultivable bacteria. However, the real-time PCR targeting the atpE gene allows more accurate Mycobacterium spp. quantification, contrary to culture based method which is subjected to many drawbacks such as decontamination artifact (about 2 log10 reduction for M. chelonae), slow mycobacteria growth, clumping of mycobacterial cells, high hydrophobicity of mycobacteria and contamination of culture media by other fast growing environmental microorganisms .
Comparison of the method targeting atpE with previously described method targeting 16S rRNA, , showed a high correlation. Moreover the method targeting atpE gene presents two major advantages over the method targeting rrs gene. First, the new method detects all the tested mycobacterial strains, while the method targeting rrs gene cannot detect isolates of M. celatum, M. heckeshornense, and M. leprae. Second, the atpE gene is present in a single copy in the Mycobacterium genomes, while the 16S rRNA gene is present either in 1 or 2 copies in the genome . When comparing samples it will be simpler to interpret the data with a stable gene copy number, and probably give a better accuracy of the mycobacterial concentration.
One of the limitations of this study is that only 31 mycobacterial species were tested in vitro as positive controls whereas more than 150 mycobacterial species have been described so far . To date, we have confirmed the sensitivity of the atpE real-time PCR method using a large representative collection of mycobacterial species (31 species, e.g. around 20% of described species), including members of MTC (n = 2), M. leprae species (n = 1), slow growing NTM (n = 13), and rapid growing NTM (n = 15). Given the broad diversity of mycobacterial species we have tested in this study, we expect the method to be applicable to all species within the Mycobacterium genus. In addition, it is the first time that a sensitive and specific molecular target has been identified based on an in silico comparison of 16 mycobacterial (13 species) and 12 non-mycobacterial genomes (4 closely related species).