Definition of novel cell envelope associated proteins in Triton X-114 extracts of Mycobacterium tuberculosis H37Rv
© Målen et al. 2010
Received: 18 September 2009
Accepted: 29 April 2010
Published: 29 April 2010
Skip to main content
© Målen et al. 2010
Received: 18 September 2009
Accepted: 29 April 2010
Published: 29 April 2010
Membrane- and membrane-associated proteins are important for the pathogenicity of bacteria. We have analysed the content of these proteins in virulent Mycobacterium tuberculosis H37Rv using Triton X-114 detergent-phase separation for extraction of lipophilic proteins, followed by their identification with high resolution mass spectrometry.
In total, 1417 different proteins were identified. In silico analysis of the identified proteins revealed that 248 proteins had at least one predicted trans-membrane region. Also, 64 of the identified proteins were predicted lipoproteins, and 54 proteins were predicted as outer membrane proteins. Three-hundred-and-ninety-five of the observed proteins, including 91 integral membrane proteins were described for the first time. Comparison of abundance levels of the identified proteins was performed using the exponentially modified protein abundance index (emPAI) which takes into account the number of the observable peptides to the number of experimentally observed peptide ions for a given protein. The outcome showed that among the membrane-and membrane-associated proteins several proteins are present with high relative abundance. Further, a close examination of the lipoprotein LpqG (Rv3623) which is only detected in the membrane fractions of M. tuberculosis but not in M. bovis, revealed that the homologous gene in M. bovis lack the signal peptide and lipobox motif, suggesting impaired export to the membrane.
Altogether, we have identified a substantial proportion of membrane- and membrane-associated proteins of M. tuberculosis H37Rv, compared the relative abundance of the identified proteins and also revealed subtle differences between the different members of the M. tuberculosis complex.
Tuberculosis is an airborne infection caused by Mycobacterium tuberculosis. It is estimated that one-third of the world's population is latently infected with M. tuberculosis, and that each year about three million people die of this disease. The emergence of drug-resistant stains is further escalating the threat to public health (WHO, 2003). In spite of global research efforts, mechanisms underlying pathogenesis, virulence and persistence of M. tuberculosis infection remain poorly understood .
M. tuberculosis is a facultative intracellular pathogen that resides within the host macrophages [2–4]. When M. tuberculosis invades host cells, the interface between the host and the pathogen includes membrane- and surface proteins likely to be involved in intracellular multiplication and the bacterial response to host microbicidal processes . Recently, the cell wall of M. tuberculosis was reported to posses a true outer membrane adding more complexity with regard to bacterial-host interactions and also important information relevant for susceptibility to anti-mycobacterial therapies [5–7]. Revealing the composition of the membrane proteome will have an impact on the design and interpretation of experiments aimed at elucidating the translocation pathways for nutrients, lipids, proteins, and anti-mycobacterial drugs across the cell envelope. According to bioinformatic predictions, 597 genes (~15%) of the M. tuberculosis H37Rv genome [8, 9], could encode proteins having between 1 and 18 transmembrane α-helical domains (TMH), which interact with the hydrophobic core of the lipid bilayer. The confirmation of the expression of these genes at the protein level may lead to new therapeutic targets, new vaccine candidates and better serodiagnostic methods.
Membrane proteins resolve poorly in two-dimensional polyacrylamide gel electrophoresis (2D-PAGE) and proteomic profiling of mycobacterial membrane proteins remains a major challenge. Their limited solubility in aqueous buffer systems and their relatively low abundance in a background of highly abundant cytoplasmic proteins have yet to be overcome. Several studies have reported extraction of membrane- and membrane-associated proteins using centrifugation to obtain purified cell wall and cell membrane fractions for analysis by sodium-dodecyl-sulphate polyacrylamide gel electrophoresis (SDS-PAGE) in combination with liquid chromatography tandem mass spectrometry (LC-MS/MS) [10–13]. Common for these studies is pre-isolation of the membrane and cell wall of the bacteria, and application of different washing techniques prior to protein extraction by detergents. In this study, we separated hydrophobic membrane- and membrane-associated proteins directly from sonicated M. tuberculosis H37Rv using phase separation with Triton X-114. The efficacy of this method was shown with Mycobacterium bovis BCG in a previous work .
Comparison of expressed levels of the identified proteins was performed using the emPAI [15, 16] This approach relates the number of experimentally observed peptide ions in a given protein to the number of theoretically observable peptides. Our results show that among the membrane-and membrane-associated proteins several proteins are present in high relative abundance. Using bioinformatic analysis, we also found that the gene sequence encoding Rv3623 which is annotated as a potential lipoprotein in both M. tuberculosis and M. bovis, is shorter in M. bovis and have lost the N-terminal signal peptide and lipobox that mediate the prelipoprotein translocation and its subsequent lipidation that retains it to the membrane.
Functional classification of the identified M. tuberculosis H37Rv proteins.
Functional group a
Functional group no.
Total protein number b
Number of observed proteins c
Virulence, detoxification, adaptation
Cell wall and cell processes
Insertion sequences and phages
Intermediary metabolism and respiration
Conserved hypotheticals with an orthologue in M. bovis
Lipoproteins represent a subgroup of exported proteins characterized by the presence of a lipobox. The lipobox motif is located in the distal C-terminal part of the N-terminal signal peptide . This motif is a recognition signal for lipid modification on the conserved and essential cysteine residue. Precursor lipoproteins are mainly translocated in a Sec-dependent manner across the plasma membrane and are subsequently modified . The proteins identified in this study were analysed by the lipoP algorithm http://www.cbs.dtu.dk/services/LipoP/, and 63 were predicted as potential lipoproteins (Additional file 2, Table S1) based on the presence of a cleavable signal peptide and a lipobox motif. Eight lipoproteins are described for the first time. In sum the findings comprises over 56% of all predicted lipoproteins in the genome.
Outer membrane proteins (OMPs) are a class of proteins residing in the outer membrane of bacterial cells. Identification of OMPs is important as they are exposed on the bacterial surface and so are accessible drug targets. Recently, Song and colleagues analysed the genome of M. tuberculosis and predicted 144 proteins as potential OMPs based on the amphilicity of the β-strand regions, absence of hydrophobic α-helices and the presence of a signal peptide . In our study, we observed 54 (37.5%) of these proteins, and 9 of them have not been described in previous proteomic works (Additional file 2, Table S1).
The 'grand mean of hydropathicity' (GRAVY) score is the average hydropathy score for a protein. According to Kyte and Doolittle, integral membrane proteins have a higher GRAVY score than soluble proteins. A positive score >-0.4 suggests increased probability for membrane association; the higher the score, the greater the probability . GRAVY scores were calculated for all the identified proteins using the PROTPARAM tool http://us.expasy.org/tools/protparam.html. Three-hundred and sixty nine proteins without a TMH region had positive GRAVY scores (Additional file 3, Table S2). A substantial proportion of the detected proteins lacked a predicted retention region and had a negative GRAVY score, suggesting that they were soluble proteins. However, it is possible that at least some of them might be functionally membrane-associated through formation of protein complexes with membrane-anchored proteins. In a previous study we showed that several hydrophilic proteins are retained in the lipophilic membrane fraction due to interaction with hydrophobic proteins [21–23].
List of the 14 most frequently observed membrane proteins.
No. of TMH a
No. of observed peptides b
(Mol %) c
Possible proline rich antigen
Conserved hypothetical protein
Possible ATP synthase b chain
Possible glutamine-transport transmembrane protein
Possible transmembrane protein
Possible glutamine-transport transmembrane protein
Possible conserved membrane protein
Possible conserved membrane or secreted protein
Possible transmembrane cytochrome C oxidase
Possible rieske iron-sulfur protein
Possible serine protease
Possible conserved transmembrane protein
Possible transmembrane carbonic anhydrase
List of the 10 most frequently observed lipoproteins.
No. of observed peptides a
(Mol %) b
Possible periplasmic superoxide dismutase
19 kda lipoprotein antigen precursor
Periplasmic phosphate-binding lipoprotein
Possible conserved lipoprotein
Possible conserved lipoprotein
Periplasmic phosphate-binding lipoprotein
Probable conserved lipoprotein
Probable conserved lipoprotein
Possible conserved lipoprotein
Due to the anticipated role of membrane- and membrane-associated proteins of M. tuberculosis in virulence, it is important to characterize these proteins. Therefore, the aim of the present study was to perform a proteomic analysis of these proteins from the virulent reference strain M. tuberculosis H37Rv in extracts obtained with the non-ionic detergent Triton X-114. The proteins from the lipid phase of the detergent, which was enriched for membrane proteins as validated by immuno-blotting (Figure 1, panel B), were precipitated, separated, and identified by high accuracy mass spectrometry. In total, 1417 proteins were identified and analysis of the primary amino acid sequences by bioinformatic tools revealed that 31% of the proteins were membrane- or membrane-associated. The list included more than 50% of all predicted integral membrane proteins in the genome.
These results show a significant improvement compared to the two studies of mycobacterial plasma membrane proteins by Gu et. al.  and Xiong et al., . In these studies, membrane proteins were enriched by differential centrifugation and alkaline treatment of crude membranes with sodium carbonate and urea and separated by SDS-PAGE followed by protein identification with LC-MS/MS. The study by Gu et al. revealed 739 M. tuberculosis H37Rv proteins including 85 membrane proteins (11.5%), while Xiong et al. identified 349 proteins, of which 100 were predicted membrane proteins (28.7%). The low percentage of integral plasma membrane proteins among the proteins identified in these studies was probably based in the membrane enrichment methods. We reduced the soluble protein contamination by phase separation of whole bacterial sonicates, and also applied state-of-the-art mass spectrometry analysis for identification of peptides.
More than 50% of all predicted lipoproteins in the genome were found. These are proteins translocated across the cell membrane and retained in the cell envelope by post-translational lipid modification. They are functionally diverse, and are suggested to be involved in host-pathogen interactions [27, 28]. They are also of interest with respect to development of serodiagnostic tests for tuberculosis due to their strong immunogenicity [29, 30].
We also found 37% of all predicted OMPs , which is an essential group of proteins involved in import of nutrients, secretion processes and host-pathogen interactions in gram-negative bacteria , and this is also likely to be of great importance in mycobacteria because it is now firmly established that they have a true outer membrane [5–7].
Even though a considerable number of observed proteins were predicted as integral membrane- or membrane-associated proteins, a substantial proportion of the detected proteins lacked a predicted retention region. For those proteins we measured the GRAVY score which express the total hydrophobicity of a protein as an indicator for membrane association. However, this is just a measure of increased probability for membrane association based on the fact that most integral membrane proteins have a positive GRAVY value. If a protein has a positive value, even though it lacks a retention signal, it is probably associated with the membrane. On the other hand, some of the hydrophilic proteins with a negative GRAVY value might still be retained in the membrane through formation of protein complexes with membrane-anchored proteins [21–23]. Several proteins in this group are encoded in operons of well known integral enzyme complexes .
Using state-of-the-art proteomic instrumentation and techniques, subtle details could be revealed at the individual protein level, such as experimental identification of signal peptide cleavage sites of predicted secreted proteins , or confirmation of the start codon, or identification of peptides from regions predicted to be non-coding thus indicating a more up-stream start codon [33, 34], or even detection of novel genes . Therefore, the data obtained in this study was examined both in detail and in the context of what have been reported in the literature. To examine the amounts of individual proteins in the membrane fraction we applied the emPAI algorithm. The emPAI calculation gives an approximate estimate of the abundance of a certain protein, and it calculates the protein concentration (in mol %) [15, 16]. An advantage of this method is that it gives a more realistic picture of the protein profile compared to the mRNA levels, which could be difficult to relate to the actual protein amount. The membrane proteins (14 proteins) and the lipoproteins (10 proteins), with the highest relative abundance values are listed in Tables 2 and 3, respectively.
Interestingly, two of the proteins (Rv0072 and Rv2563) among those with the highest relative abundance values were "possible glutamine-transport transmembrane ABC transporter protein", with sequence motifs that belong to the ABC transport system. Glutamine is a major cell wall component of pathogenic mycobacteria only . Its production is mainly catalyzed extracellulary by glutamine synthetase GlnA1 (Rv2220) . Tullius et. al., 2003 showed that a M. tuberculosis glnA1 mutant requires a relatively high level of exogenous L-glutamine for growth in vitro, and the mutant was attenuated for intracellular growth in differentiated THP-1 cells, and it was also avirulent in infected guinea pigs . Identification of two related proteins among the most abundant membrane proteins in M. tuberculosis, underlines the importance of production and transport of glutamine for the pathogen and its virulence.
The Rv0072 protein is only reported in studies conducted on M. tuberculosis [25, 26] and not on M. bovis BCG (11, 17). It was identified by 11 different peptides giving sequence coverage of 44%, and the high emPAI value observed for this membrane protein suggests that it is abundantly present in the membrane of the virulent M. tuberculosis H37Rv strain. The open reading frames and sequences 100 bp up-stream to the start codon from M. tuberculosis H37Rv and M. bovis BCG 1173P2 and AF2122/97 were aligned, but the DNA sequences were identical and could not explain why Rv0072 has not been observed in M. bovis (data not shown).
Among the 10 most abundant lipoproteins 7 were not assigned any biological function, reflecting a fundamental lack of knowledge about these proteins. A careful examination revealed that the possible conserved lipoprotein LpqG (Rv3623) lies on the border of region of difference 9 (RD9) . RD9 is deleted from all M. bovis lineages and consequently this protein has only been identified in proteomic studies performed on M. tuberculosis H37Rv [25, 40], but not been reported in previous proteomic works on M. bovis BCG [14, 24, 41]. This RD region is also missing in other mycobacterial strains such as Mycobacterium microti or Mycobacterium pinnipedii. This region was first described by Gordon et. al., 1999  as RD8 and later put in an evolutionary context by Brosch et. al., 2002 , which now corresponds to the region RD09 described by Behr et. al., . A close examination of the gene encoding Rv3623 revealed that it is 207 bp shorter with a deletion in the N-terminal region that includes the signal peptide and the predicted lipo-box in the genomic sequences of M. bovis AF2122/97 and M. bovis BCG Pasteur 1173P2. The gene is annotated to encode a lipoprotein in the M. bovis strains even though the lipo-box is missing and it is therefore questionable whether it should be considered as a lipoprotein in M. bovis. The identification of this protein with 7 peptides covering 34% of its sequence in M. tuberculosis H37Rv suggests that it is a major lipoprotein.
The two lipoproteins listed in Table 3, annotated as "periplasmic phosphate-binding lipoprotein" (Rv0932c) is a known antigen  that also induces antibody responses in tuberculosis patients . The 19 kDa lipoprotein antigen precursor (Rv3763) have been extensively studied due to its immunogenic properties [46–49]. Enrichment and analysis of lipoproteins with respect to humoral and cell-mediated immunity in infected individuals might ultimately lead to the identification of additional antigens that can serve as biomarkers for M. tuberculosis infection.
In summary, we have enriched and extracted membrane- and membrane-associated proteins from M. tuberculosis H37Rv using Triton X-114, and identified the largest number of this subset of proteins reported so far. Further analysis of the data obtained in this study with bioinformatic tools suggests that several of these proteins are major membrane proteins. We have described one major lipoprotein of M. tuberculosis which has become a pseudogene by the RD9 deletion in M. bovis.
The mycobacterial reference strain M. tuberculosis H37Rv (ATCC 27294), used in this study was kindly provided by Dr Harleen Grewal, The Gade Institute, University of Bergen, Bergen, Norway. The bacilli were cultured on Middelbrook 7H10 agar plates with OADC enrichment (BD Difco) at 37°C and 5% CO2 for 3-4 weeks. Bacterial colonies were harvested by using an extraction buffer consisting of phosphate-buffered saline (PBS), pH 7.4 with freshly added Roche Protease Inhibitor Cocktail (Complete, EDTA-free, Roche Gmbh, Germany). Six hundred μl of this extraction buffer was added to each agar plate and the mycobacterial colonies were gently scraped off the agar surface using a cell scraper. Aliquots of the resulting pasty bacterial mass was transferred into 2 ml cryo-tubes with O-rings (Sarstedt, Norway) containing 250 μl of acid washed glass beads (≤ 106 μm; Sigma-Aldrich, Norway) and an additional 600 μl of extraction buffer, and stored at -80°C until protein extraction was performed. For protein extraction, the mycobacteria were disrupted mechanically by bead-beating in a Ribolyser (Hybaid, UK) at max speed (6.5) for 45 seconds.
Triton X-114 phase-separation was used to isolate lipophilic proteins following the method of Bordier . In brief, 3-4 week old bacilli were lysed by bead beating and centrifuged, initially at 2300 g to remove unbroken cells and cell-wall debris. Triton X-114 was added to the supernatant (final detergent concentration 2%, v/v) and the suspension was stirred at 4°C for 20 minutes to obtain the protein extract in a single phase. Residual insoluble matter was removed by centrifugation at 15700 g for 10 min, and the solution separated into two phases, an upper (aqueous) and lower (detergent) phase after 10 minutes incubation at 37°C. The detergent phase was collected and proteins were precipitated by acetone.
Extracted proteins (50 μg) were mixed with 25 μl SDS loading buffer and boiled for 5 minutes before separation on a 10 cm long 1 mm thick 12% SDS polyacrylamide gel (Invitrogen, Carlsbad, CA, U.S.A.). The protein migration was allowed to proceed until the bromophenol dye had migrated to the bottom of the gel. The protein bands were visualized with Coomassie Brilliant Blue R-250 staining (Invitrogen). Protein lanes were excised and divided in fractions according to the bands of the protein standard, ranging from ~3 kDa to ~188 kDa. The gel pieces were washed twice with 50% acetonitrile (ACN) in 25 mM ammonium bicarbonate (NH4HCO3) for 15 minutes at room temperature (RT), and subsequently dehydrated by incubating them with 50 μl 100% ACN for 20 minutes at RT. The proteins were reduced using 10 mM dithiotreitol and alkylated with 55 mM iodoacetamide; both in 100 mM NH4HCO3. The gel pieces were dehydrated by 100% ACN as described above, and rehydrated in 25 mmol/l NH4HCO3 followed by in-gel protein digestion with trypsin (Promega, Madison, U.S.A.) for 16-20 h at 37°C. The digested peptides were eluted by incubating the gel pieces with 50 μl 1% formic acid (FA) for 20 minutes at RT. The supernatant containing the peptides were collected after centrifugation at 15700 g for 10 minutes. Then, the gel pieces were incubated with 50 μl 0.1% FA in 50% ACN for 20 minutes at RT, followed by centrifugation at 15700 g. The supernatant was collected and combined with the previous one. Finally, the gel pieces were dehydrated with 50 μl 100% ACN for 20 minutes at RT, and the supernatant was collected after centrifugation as described above and added to the pool.
Experiments were performed on a Dionex Ultimate 3000 nano-LC system (Sunnyvale CA, USA) connected to a linear quadrupole ion trap-Orbitrap (LTQ-Orbitrap) mass spectrometer (Thermo Electron, Bremen, Germany) equipped with a nanoelectrospray ion source. The mass spectrometer was operated in the data-dependent mode to automatically switch between Orbitrap-MS and LTQ-MS/MS acquisition. Survey full scan MS spectra (from m/z 400 to 2,000) were acquired in the Orbitrap with resolution R = 60,000 at m/z 400 (after accumulation to a target of 1,000,000 charges in the LTQ). The method allowed sequential isolation of up to five of the most intense ions for fragmentation on the linear ion trap using collision induced dissociation at a target value of 100,000 charges.
For accurate mass measurements the lock mass option was enabled in MS mode and the polydimethylcyclosiloxane (PCM) ions generated in the electrospray process from ambient air (protonated (Si(CH3)2O)6; m/z 445.120025) were used for internal recalibration during the analysis . Target ions already selected for MS/MS were dynamically excluded for 30 seconds. General mass spectrometry conditions were: electrospray voltage, 1.9 kV Ion selection threshold was 500 counts for MS/MS, an activation Q-value of 0.25 and activation time of 30 ms was also applied for MS/MS.
The obtained data was searched against the publicly available Tuberculist database version R10 http://genolist.pasteur.fr/TubercuList/ using MASCOT software version 2.1 (Matrix Science, UK). The database was in-house modified to include reversed sequences of the original ORFs in order to determine false-positive thresholds of the Mascot identification engine . Tuberculist was preferred over secondary annotations performed by independent institutes because previous data from our group demonstrated that the Tuberculist annotation appear to be more reliable . The criteria for the Mascot search were as follows: Cysteine carbamidomethylation was set as fixed modification, methionine oxidation and N-acetylation (protein) as variable modifications. Up to 3 missed cleavages were allowed. Peptide (precursor) ion mass tolerance was 15 ppm, and the fragment ion tolerance was 0.5 Da. Mascot scoring showed that p > 0.01 was equivalent to a score of 24. The criterion for a positive identification of proteins identified with at least 2 peptides was a minimal score of 24 for each peptide which represents a 1:10,000 false positive rate at protein level. The maximal score for a peptide from a reversed entry of the annotated M. tuberculosis H37Rv database was found to be 31 (data not shown). This was considered as a threshold for false-positive identifications, and all proteins identified in this study with only one peptide were based on a score higher than 37 (25:10,000). No false positive identifications were observed from the reversed database using these criteria. For visualization and validation of spectra, MSQuant version +1.4.2 was used. MSQuant is an open source tool available at http://msquant.sourceforge.net and is widely used for LC-MS/MS data analysis .
Proteins from both lipid and aqueous phase were separated by SDS-PAGE, electroblotted to nitrocellulose membranes (Amersham Biosciences) and blocked with 5% non-fat milk in PBS containing 0.5% Tween 20 (PBST) for 1 hour at RT. The membranes were then washed with PBST for 10 min. This was repeated three times. After the last wash, the membranes were incubated overnight at 4°C with rabbit antisera raised against 1) a cell wall fraction and 2) a crude whole cell lysate of M. bovis BCG. Sera were diluted 1:500 in PBS with 1% non-fat milk and 0.1% Tween 20. The blots were washed thoroughly with PBST as described above, and probed with Horse Radish Peroxidase (HRP) conjugated anti-rabbit IgG (1:2000 dilution) (Amersham Biosciences) for 1 hour at RT. Antigen-antibody complexes were visualized by a chemiluminescent reaction (Pierce, Rockford, IL, U.S.A.) using Chemidoc XRS (Bio-Rad, Hercules, CA, USA).
Gene and protein sequences were obtained from Tuberculist http://genolist.pasteur.fr/TubercuList/ and BoviList http://genolist.pasteur.fr/BoviList/. Sequences alignments were done using the Blast 2 algorithm http://blast.ncbi.nlm.nih.gov/Blast.cgi. For prediction of lipoproteins, the LipoP algorithm was used http://www.cbs.dtu.dk/services/LipoP/. For detection of potential secreted proteins SignalP version 3.0 was used http://www.cbs.dtu.dk/services/SignalP/.
The abundance of each protein was estimated by calculating the protein abundance index (PAI) , and the emPAI . The estimation is based on the calculation of identified peptides per protein normalized by the theoretical number of peptides for the same protein. This is considered to be a good method for quantitative estimation because it takes into account that larger proteins are expected to generate more observable peptides in the mass spectrometry analysis, compared to smaller ones [15, 16]. The final peptide list obtained from the MS analysis was submitted to a publicly available tool http://empai.iab.keio.ac.jp/, and emPAI values were calculated using the following parameters: M. tuberculosis H37Rv Tuberculist version R10 database; trypsin enzyme, carbamidomethyl (C) modification; peptide MW range from 300 to 6000 Da; no retention time filtering; peptide score higher than 24 as filtered by Mascot.
This work was supported by grants from the Regional Health Authorities of Western Norway (Projects 911077, 911117 and 911239) and by the National Programme for Research in Functional Genomics in Norway (FUGE) funded by the Norwegian Research Council (Project 175141/S10). We thank Dr. Benjamin Thomas and the Proteomic Facility at the Dunn School of Pathology, Oxford University, for providing time at the LTQ-Orbitrap used on this work. We thank the Proteomic unit, PROBE, University of Bergen for analytical services. We are indebted to Professor Lars Haarr for critical comments to the manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.