- Research article
- Open Access
The gene expression data of Mycobacterium tuberculosis based on Affymetrix gene chips provide insight into regulatory and hypothetical genes
© Fu and Fu-Liu; licensee BioMed Central Ltd. 2007
Received: 03 January 2007
Accepted: 14 May 2007
Published: 14 May 2007
Tuberculosis remains a leading infectious disease with global public health threat. Its control and management have been complicated by multi-drug resistance and latent infection, which prompts scientists to find new and more effective drugs. With the completion of the genome sequence of the etiologic bacterium, Mycobacterium tuberculosis, it is now feasible to search for new drug targets by sieving through a large number of gene products and conduct genome-scale experiments based on microarray technology. However, the full potential of genome-wide microarray analysis in configuring interrelationships among all genes in M. tuberculosis has yet to be realized. To date, it is only possible to assign a function to 52% of proteins predicted in the genome.
We conducted a functional-genomics study using the high-resolution Affymetrix oligonucleotide GeneChip. Approximately one-half of the genes were found to be always expressed, including more than 100 predicted conserved hypotheticals, in the genome of M. tuberculosis during the log phase of in vitro growth. The gene expression profiles were analyzed and visualized through cluster analysis to epitomize the full details of genomic behavior. Broad patterns derived from genome-wide expression experiments in this study have provided insight into the interrelationships among genes in the basic cellular processes of M. tuberculosis.
Our results have confirmed several known gene clusters in energy production, information pathways, and lipid metabolism, and also hinted at potential roles of hypothetical and regulatory proteins.
Knowledge about the genome sequence of Mycobacterium tuberculosis  has contributed to recent advancement in understanding the biology of this organism and its clinical relevance. Concurrent with this development, a high-throughput genome-wide gene expression analysis device in the form of microarrays has rapidly emerged as a seemingly indispensable tool for studying genomics in the modern era. These developments have brought about the revolutionary conception of new prophylactic and therapeutic interventions in the genomic perspective. Its significance should be clear, as tuberculosis is still causing millions of deaths in the world.
DNA microarrays have been applied to analyze M. tuberculosis. The first type of application focuses on genotyping, for example, species identification [2, 3] and detection of drug-resistant mutants [4, 5]. The second type of application seeks to explore altered gene expression and understand biological pathways in terms of up-regulated and down-regulated genes in certain conditions of interest, such as drug challenge , hypoxia , starvation , high temperature , and in vivo . However, existing applications do not exploit the full potential of genome-scale microarray analysis in configuring interrelationships among all genes in M. tuberculosis. We pioneered the approach that applied the Affymetrix M. tuberculosis GeneChip to gene expression analysis. Previously, this GeneChip was used for applications related to genotyping.
Our study is aimed to explore the whole-genome behavior of M. tuberculosis during log-phase growth by conducting a bioinformatics analysis on genome-wide gene expression data generated from microarray hybridization. Our results enrich the current understanding of genome functions based on sequence analysis and functional studies of individual genes in such aspects as deduction of possible roles of conserved hypothetical and regulatory proteins.
Active genes involved in growth
Research on M. tuberculosis has yet to answer the questions of how many and what genes are active during normal growth in a standard in vitro environment and how they are related to each other in a global genome-wide sense. To answer these questions, we adopted the Affymetrix GeneChip system, which, based on a specific oligonucleotide array format, could provide the absolute signal intensity in a single condition as well as the signal ratio between two conditions. Furthermore, its built-in statistical algorithm computes the so-called Detection p-value that determines the presence or absence of any given mRNA. It is this feature that we capitalize on to explore the genomic behavior of M. tuberculosis. A gene is active when it is expressed. Gene activity is measured by the expression level (i.e., abundance of corresponding mRNA) detected by microarray hybridization in this study.
The top 100 most expressed genes of M. tuberculosis in log-phase growth. For other active genes, refer to Additional file 4.
Functional genomic analysis
Representative Gene Class
(CYSH Rv1478 GGTB NIRA Rv2425C ACCD4 Rv3541C Rv0175 Rv2393 MURX Rv3701C)
(PNTAB KASA HEMC LPQI ACCD6 KASB Rv2052C Rv1728C ARGH)
Intermediate and lipid metabolism
(DRRB Rv1251C EFPA Rv2054 MURC Rv1632C Rv2395 Rv1378C QCRA CTAE RHO PARB Rv3321C ATPH Rv3921C ATPD Rv3805C Rv2901C PARA Rv2949C CTAC Rv1576C ATPB Rv0546C Rv2781C Rv0526 Rv3104C POLA PYRH Rv1870C Rv1711 Rv3672C Rv0514 DAPF Rv2554C Rv1869C NUOE NUOC Rv1178 Rv0528 Rv1481 Rv2791C Rv2610C Rv3856C Rv1565C Rv3212 Rv1043C TSNR Rv1324 PGK Rv0525 RUVC KSGA Rv2989 LPRE PURK Rv0412C RODA Rv3725 HEME Rv1339 Rv1797 NRP UREC Rv2852C Rv3781 TRPA Rv2956 TRPB Rv2808 Rv2128 Rv1695 NUOI Rv1312 NUOL NUOD Rv3806C Rv2759C Rv2966C FOLK Rv2879C Rv1780 Rv1271C Rv3693 PLCB DRRC Rv2600 NUOM NUOH Rv0875C Rv3220C Rv3885C MMPL7 Rv2475C LTP1 Rv0236C NUOJ NUOB NUON MMPL9 Rv2752C Rv0177 Rv0176 HEMB PRCA Rv2553C Rv2367C RPLT GID CMK Rv3122 EMBB Rv1303 Rv1907C Rv2792C)
Energy production and respiration
(FTSZ WAG31 Rv0902C HEMK NARL Rv2147C Rv3267 Rv1477 LPPW SIGC Rv2864C RECA Rv2826C Rv1697 LEUC LEUD Rv3909 DNAQ Rv1465 Rv0486 Rv3910 PIRG Rv2574 Rv2360C Rv3816C AROB Rv2827C FTSK Rv3587C FOLE FTSQ Rv3647C Rv3376 RHLE)
(LEUA AMIC RPOA RPSE RPLF RPSH RPLX Rv0203 RPLJ RPLR RPLE RPLB RPLC NUSG)
(RPSS ALR RPSC RPLV PNTB RPLP RPMC Rv2125 PRFA Rv1546 Rv3278C Rv1099C RPLO TPI Rv0299 Rv3677C Rv2258C)
(RIMM Rv1258C RPLL Rv2908C RPLI Rv2822C PCKA Rv1073 Rv0636 Rv0637 CYSA2 Rv0277C RPSP RPSB LPPU HTPG RPLM RPSI SSEC2 Rv0057 SSB ASPC)
Information pathways (replication, transcription, and translation)
(SIGE Rv0516C Rv0846C Rv0991C Rv3334 Rv2628 Rv2020C Rv0968 Rv2517C NARK2 FBPC LPQS Rv2662 Rv1772 Rv0967 Rv0465C Rv1813C Rv2016 HSP Rv1847 Rv0190 Rv1774 RPST ALD Rv0080 Rv2699C Rv2629 Rv0571C Rv2623 Rv0572C Rv2005C CTPF Rv3133C Rv2004C Rv2626C Rv2625C Rv2627C Rv2032 PANB Rv2466C Rv2035 Rv3134C Rv2962C Rv0081 Rv2630)
(WHIB4 Rv2668 Rv1136 Rv3342 Rv1234 PGMA Rv3895C Rv0941C LPPJ Rv0653C Rv3479 Rv1179C Rv2478C Rv0108C Rv2184C PDXH Rv0502 CDH Rv1352 FABG3 Rv3123 Rv1453 THYA LIPH Rv1893 Rv0650 ECHA14 Rv0771 Rv1413 Rv0121C Rv3654C Rv2044C LIPF Rv2670C CPSY Rv2297 Rv0165C PKS11 Rv1362C Rv2799 Rv1363C Rv2255C Rv1931C Rv3501C VIUB APT Rv1861 SECG Rv3860 Rv0149 Rv0269C Rv2639C Rv1151C Rv0230C MOAC3 Rv2722 TRXA Rv3891C Rv0188 Rv1535 Rv2288 Rv2657C Rv3764C Rv1230C Rv3288C LPQH Rv0695 Rv3633 Rv3616C Rv3399 LPRF Rv2638 Rv3615C Rv3614C PAPA3 FRDC Rv2129C Rv1926C SODA Rv2633C Rv2557 LAT Rv3733C Rv2161C PKS4 Rv2558 Rv2632C PKS3 Rv1868 NARG Rv3751 Rv0696 SIGF Rv3679 Rv2160C RSBW Rv2253 Rv2336 FUSA2 PAPA1 Rv2598 RELA PKS2 Rv3241C Rv1639C SUHB Rv1871C Rv3500C EPHA Rv1184C Rv0387C Rv1433 OMPA Rv0171 Rv3496C CDD MOAE2 Rv2024C FADE26 SCOA Rv0657C AMT Rv2348C Rv3750C Rv3491 Rv0137C Rv3887C Rv1037C Rv3874 Rv2137C LLDD2 Rv2311 Rv2205C NARJ NARH Rv2369C Rv0621 Rv1398C FURA Rv1154C Rv2472 Rv3449 FADD16 LPQO Rv0168 Rv0767C Rv0736 INFC FRDB Rv0245 NARI Rv0167 RIBC Rv0258C Rv2765 Rv1425 Rv1968)
(PDHA Rv3802C LPPD GUAB1 Rv1783 Rv1782 Rv0126 GLYS Rv1892 Rv1978 Rv0654 Rv1885C Rv0760C Rv3850 OBG FBPB PRA Rv2958C NRDE Rv1956 Rv0192 Rv1988)
Cell wall, cell processes, and metabolism
The clustering results were further visualized through Eisen's TreeView program to generate a heat map where the brightness of the red color represented the intensity of gene expression (Figure 1). Based on the cluster analysis results and the observation of four conspicuous shining bands, the map was tentatively divided into four zones from the top to the bottom, called: zones 1, 2, 3, and 4, in the consecutive order. Each zone contained clusters of genes that were strongly correlated in their expression patterns and in that sense, functionally related. Each zone in the map was represented by genes that expressed most (i.e., the brightest in red), Thus, the map zones were labeled according to the functional class (genolist.pasteur.fr/TubercuList/) of their representative genes as "intermediate and lipid metabolism", "energy", "information", and "cell wall, cell processes, and metabolism", respectively.
The two most conspicuous clusters in the first zone of the gene expression map consisted of a cluster represented by genes in the functional category of intermediate metabolism, such as cysH, ggtB, and nirA, and the other cluster represented by genes involved in the FAS-II cycle , such as accD 6, kasA, and KasB. The formation of the cluster with emphasis on the FAS-II cycle reflects the importance of this pathway in M. tuberculosis growth. As for intermediate metabolism, M. tuberculosis can metabolize many kinds of carbohydrates, hydrocarbons, alcohols, ketones and carboxylic acids . This zone was adjacent to the energy zone, suggesting the close relationship between intermediate metabolism and energy production.
The most prominent genes in the second zone of the gene expression map were the ATP synthase gene complex, which produces ATP from ADP and is critical in energy metabolism. In addition, genes encoding enzymes involved in the respiratory chain, such as the nuo (NADH-ubiquinone oxidoreductase) gene complex, were clustered into this zone. In another study , ATP synthase and nuo gene complexes were found to be down-regulated together. Thus, they may be co-regulated.
The third zone of the gene expression map was represented by genes involved in the information pathways, e.g., dnaQ (DNA polymerase III), recA (recombinase), rpoA (DNA-directed RNA polymerase), and the 30S and 50S ribosome protein gene complexes (rpsB, rpsC, rplB, rplC, etc.). As these genes play a vital role in genetic information replication, transcription, and translation, their expression is essential for maintaining bacterial growth. We noticed that the gene Rv2258C (encoding a possible transcriptional regulatory protein) correlated well with a group of ribosomal protein genes. This result is related to a report that the production of ribosomes is increased through the transcriptional regulation of genes encoding ribosomal proteins during the growth phase of yeast .
The last zone of the gene expression map was represented by genes classified under the category of cell wall, cell processes, and metabolism. This category comprises membrane proteins and proteins involved in cell processes, including secreted and transmembrane proteins , as well as enzymes involved in intermediate metabolism. Genes and their protein derivatives located in this zone are related to cell wall synthesis, transportation of organic and inorganic substance across membrane, and immunological responses, such as narK 2 (a nitrate/nitrite transporter), fbpC (mycolyl transferase), hsp (a protein induced by heat stress), ald (a secreted enzyme), ctpF (a metal cation transporter), sodA (superoxide dismutase, which destroys radicals), ompA (an outer membrane protein), amt (an ammonium-transport integral membrane protein), furA (a protein for ferric uptake regulation), fbpB (a protein in the antigen-85 complex), and genes in the lipoprotein family (lpqS, lpqH, lppJ, lprF, lpqO and lppD).
The most important feature of mycobacterial cell wall is the substantial amount (up to 60% of the total mass) of lipid components, particularly, the very long chain mycolic acids, which are combined with surface glycolipids to form a pseudolipid bilayer . Since the cell wall synthesis involves lipid metabolism , the main cluster in zone 4 also contains some genes in this class, such as fadE26 (encoding acyl-CoA dehydrogenase) and fadD16 (encoding fatty-acid-CoA ligase).
The distribution (percentage %) of functional categories for genes in the major clusters of each zone on the gene expression map.
Transcriptional regulators associated with major gene clusters based on microarray analysis.
Gene Product http://www.ncbi.nlm.nih.gov
Assocated Gene Cluster
Probable transcriptional regulatory protein
Respiration and energy production
Possible transcriptional regulatory protein
Probable transcriptional regulatory protein probably MerR-family
Cell wall, cell processes, and metabolism
Probable transcriptional regulatory protein
Probable transcriptional regulatory protein
Probable transcriptional regulatory protein
Possible transcriptional regulatory protein probably TetR-family
Possible transcriptional regulatory protein probably GntR-family
Probable transcriptional regulatory protein
Probable transcriptional regulatory protein
Possible transcriptional regulatory protein
The Rv2989 gene was associated with the cluster characterized by energy metabolism and respiration in our data. The association relation appears to be consistent with reports that this gene is up-regulated at high temperatures  and down-regulated after starvation .
Rv3334 was up-regulated at high temperatures  as well as after starvation . This gene is probably in the MerR (mercury resistance) family and its protein is similar to many regulatory proteins in sequence.
Rv0081 can be induced by hypoxia . Its presence in growing bacterial cells is called into question. However, this gene is only weakly expressed in the present study, and likely to be up-regulated if oxygen is depleted.
The whiB4 gene encodes a protein homologous to a Streptomyces sporulation factor , and the gene is up-regulated after starvation , suggesting a possible link between starvation and sporulation. In addition, the association of whiB4 with the class of cell wall and cell processes makes sense from the point that sporulation could potentially involve cell membrane.
Rv0653c is probably in the TetR family. Its significance is reflected by the fact that proteins in this family are involved in the transcriptional control of multi-drug efflux pumps and pathogenicity . Another regulatory gene, Rv0165c, is probably in the GntR family. In E. coli, GntR regulates gluconate uptake and catabolism as a repressor . Both Rv0653c and Rv0165c are apparently involved in a membrane-associated cellular process. Another gene worth attention is Rv1931c, which regulates genes important for virulence of M. tuberculosis , possibly via a cellular process that translates extracellular stimuli into a transcriptional signal.
The availability of the complete genome sequence of Mycobacterium tuberculosis  combined with rapidly emerging microarray technology  has catalyzed the process of understanding the bacterial biology and pathogenicity and expedited the development of new diagnostics and therapeutics for tuberculosis. The microarray approach has enabled high-throughput gene expression analysis on a genomic scale in a field known as functional genomics . In particular, elucidation of functional relationships among genes based on genome-wide gene expression data from DNA microarray hybridization has been successfully demonstrated for eukaryotes, notably yeast [12, 23], but it has not been done for M. tuberculosis. To date, the functional classification of genes in M. tuberculosis is mainly based on the biological study of individual genes as well as sequence analysis and comparison with homologous genes in other bacteria. In this study, we provide a comprehensive analysis that addresses this issue from the perspective of functional genomics.
In the application to M. tuberculosis, DNA microarrays have been used for comparing species, detecting drug-resistant mutants, and studying biological behavior under various conditions. In general, there are two computational paradigms for microarray data analysis. The first paradigm is to identify genes differentially expressed across two conditions; the second paradigm is to identify genes expressed in a coordinated manner that share common roles in cellular physiology or metabolism. Most of the applications for M. tuberculosis to date are based on the first paradigm, whereas our study described here is based on the second paradigm.
In this study, active genes were identified by means of the Affymetrix GeneChip. As its unique feature, the Affymetrix system uses multiple oligonucleotide probes for implementing each gene sequence to be interrogated. Furthermore, the system is capable of analyzing the presence or absence of each mRNA. In contrast to the cDNA microarray system that is focused on differential gene expression across two conditions, the Affymetrix system can calculate gene expression in a single condition and compare gene expression across multiple conditions. In this way, the Affymetrix system is more flexible and informative. The flexibility can be attributed to the use of PM/MM probes, instead of two explicitly defined external conditions, for implementing the test/control mechanism in microarray hybridization.
Our method for analysis of in vitro genomic activity of M. tuberculosis can be extended to study functional genomics in vivo. Understanding what genes are switched on or off between in vitro and in vivo conditions would shed light on issues, such as how biological adaptation leads to bacterial latency, why there is discrepancy between laboratory sensitivity and clinical efficacy, and so on. Thus, the functional-genomics data obtained in this work can serve as a reference for interpreting data generated in other contexts. Our genomic analysis was based on multiple RNA samples extracted during log-phase growth in contrast to other coordinated gene expression analyses based on samples collected in a time course or under different conditions. Our experiment design is justified, given the fact that the expression level of any gene has considerable fluctuation from time to time (Table 1) in log-phase growth, as evidenced from the observation that the standard deviation of a gene expression was sometimes greater than 50% of its mean across samples. Variation in gene expression across different time settings enabled the correlation among different genes to be analyzed. The validity of our experiments is supported by reconfirmation of several known growth-related gene clusters. However, our approach is not applicable to samples collected during stationary phase, when little variation in gene expression is expected across samples.
The in vitro broth culture condition has often been used as the reference condition to study the gene expression of M. tuberculosis in other conditions, such as hypoxia and starvation. However, the present study is the first to explore the functional genomics of this organism grown in log-phase culture. Bacterial growth can be divided into four different phases: lag phase, exponential or log phase, stationary phase, and death phase. It is the log phase that we focused our study on. During this phase, high growth activity is evidenced by our data showing that about half of the genes in the genome were expressed. In contrast, many genes in M. tuberculosis are repressed during the stationary phase, a condition similar to but milder than the non-replicating state of tubercle bacilli in an anaerobic condition . In particular, the dormancy regulatory gene, dosR is weakly induced during the stationary phase while strongly induced in an anaerobic non-replicating state. An interesting finding based on our work is that dosR is always moderately expressed even in the log phase, suggesting its possible housekeeping role. Our data further showed that an important gene, acr (hspX), which is induced under hypoxia  and starvation , was always expressed in the log phase. Global gene expression profiling analysis of M. tuberculosis in mouse  and human tissue  indicated that lipid metabolism was critical for the bacilli to survive in the host environment. In these conditions, isocitrate lyase (ICL), an enzyme of the glyoxylate shunt (a pathway alternative to the tricarboxylic acid cycle) and related to mycobacterial persistence in macrophages , is up-regulated. It is consistent with our finding that icl is weakly expressed or absent in the log-phase culture.
The extent to which gene expression profiles across a set of independently collected samples suffice to separate genes into functional clusters in consistency with prior knowledge is attributable to the rigorous statistical model built in the Affymetrix system. Several familiar gene groups with clear designated functions, such as electron transport, protein synthesis and type II fatty acid synthesis, were observed in the data, offering credence to our analysis. However, genes associated with different functional classes would be placed in the same cluster if they appear to co-express. This implies that genes can share commons roles while differing in their functions, as illustrated by the earlier example of energy-dependent transportation across the cell membrane. In fact, comparing gene clusters based on gene expression with gene classes based on sequence analysis would offer new opportunities for re-defining interrelationships among genes in the genome.
Gene clusters built out of expression profiles can be configured as functional linkage networks among genes but these clusters do not correspond directly to protein networks  constructed using a combination of Rosetta stone, phylogenetic profile, conserved gene neighbor, and operon computational methods. Genes, which share similar biological functions, may operate at different stages of the cell cycle or become active under different conditions, and hence have different expression profiles . However, some gene families, in particular those encoding ribosome proteins are closely linked in both gene-expression and protein networks, as seen in our data.
Conserved hypothetical proteins associated with the gene clusters of each zone on the gene expression map.
Conserved Hypothetical Proteins
Zone-1 (Intermediate and lipid Metabolism)
(Rv2425c Rv3541c Rv2393 Rv3701c Rv2052c Rv1728c)
Zone-2 (Energy and Respiration)
(Rv1251c Rv2054 Rv1378c Rv3321c Rv2901c Rv2949c Rv0546c Rv1870c Rv1711 Rv3672c Rv2554c Rv3856c Rv3212 Rv1043c Rv0525 Rv1339 Rv2956 Rv2759c Rv2879c Rv1780 Rv2475c Rv2752c Rv0177 Rv2367c)
(Rv1632c Rv2808 Rv3122 Rv1907c)
Zone-3 (Information Pathways)
(Rv2147c Rv3267 Rv1697 Rv3909 Rv2574 Rv3376 Rv2125 Rv1546 Rv1099c Rv2908c Rv1073 Rv0636 Rv0637 Rv0277c)
(Rv2826c Rv2360c Rv2827c Rv3647c Rv0299 Rv2822c Rv0057)
Zone-4 (Cell Wall, Cell Processes, and metabolism)
(Rv0516c Rv0991c Rv2020c Rv0968 Rv0967 Rv1813c Rv1847 Rv0190 Rv0080 Rv2699c Rv2629 Rv0571c Rv2623 Rv2005c Rv2004c Rv2626c Rv2627c Rv2032 Rv2466c Rv2035 Rv3134c Rv0941c Rv2478c Rv2184c Rv0502 Rv1352 Rv1893 Rv1413 Rv0121c Rv3654c Rv2044c Rv2670c Rv3860 Rv0269c Rv2722 Rv0695 Rv3633 Rv3616c Rv3399 Rv2638 Rv3615c Rv3614c Rv3733c Rv2632c Rv1868 Rv2598 Rv1871c Rv0387c Rv2024c Rv0657c Rv2137c Rv2311 Rv2205c Rv1398c Rv2472 Rv0767c Rv0258c Rv1425 Rv1978 Rv1885c Rv0760c PRA Rv0192)
(Rv2628 Rv2517c Rv2662 Rv1772 Rv2016 Rv0572c Rv2630 Rv1179c Rv0108c Rv3123 Rv2297 Rv2255c Rv1535 Rv2288 Rv3288c Rv2633c Rv2557 Rv2558 Rv2336 Rv3491 Rv2369c Rv1154c Rv3850)
Organized according to correlations in gene expression across samples, the gene expression image created by Eisen's Cluster and TreeView programs (Figure 1) enabled us to visualize four transcriptional profiles, which, named according to the functional classes of the dominant genes in that region and put in a linear order over the image, were "intermediate and lipid metabolism", "energy and respiration", "information pathways", and "cell wall, cell processes and metabolism". The dendrogram was constructed and displayed so that similar clusters were likely to be located in proximate nodes. In the present application, as the similarity measure is based on the correlation in gene expression, physical distance on the tree reflects the degree of correlation among gene clusters, even with no guarantee of their optimal linear ordering in the tree . As the expression of the genetic code lies at the heart of all physiological processes and metabolisms, it is logical that the information gene cluster functionally correlates with other gene clusters, a view supported by the observation that it was situated around the center of the image.
As it is now, there are more than 100 transcriptional regulatory genes in M. tuberculosis genome [1, 15]. However, only a fraction of them have been experimentally studied in detail for their functions. A recent survey shows that regulatory proteins account for 9 (20%) out of 45 virulence factors identified in M. tuberculosis . Unraveling the gene regulatory network would allow us to understand both physiological and virulence mechanisms and to develop novel drugs that work at the level of gene regulation. Since elucidating the roles of these genes and their clinical relevance is always time-consuming; it is practically necessary to set up priority for them. A reasonable assumption is that a regulatory gene regulates some other genes in the same functional cluster with a high probability. Under this assumption, we have identified several potentially important transcriptional regulatory genes involved in major biological pathways (Table 4). Their significance has been indicated by analysis based on the literature. Further biological investigation on these genes is warranted in the future work.
All the microarray data and supplementary materials produced in this study are posted at our web site [see Additional file 1].
Genes involved in the in vitro log-phase growth of M. tuberculosis have been identified. The gene expression map (Figure 1) represents broad patters of functional concordance of closely related genes, but more importantly, it summarizes the coordinated cellular activities associated with the growth process on the genomic level. As it is today, hundreds of genes in the genome are annotated as conserved hypotheticals without clearly specified functions. Our data have shown that more than 100 such hypotheticals were actually expressed in the cell medium, and their biological roles can be suggested by their correlation with other known genes. In addition, the roles of most transcriptional regulatory genes predicted in the genome remain to be elucidated. In this study, we have discovered several regulatory genes that may exert regulatory influence on the growth of M. tuberculosis, and their roles may be inferred by what functional clusters they join. The data and information generated here provide an integrated genomic view about gene functions and interrelationships in M. tuberculosis, and can be incorporated in new experiments for research in tuberculosis. This study has not only transcriptionally validated several known gene clusters but also provided insight into a host of unknown hypothetical and regulatory genes.
Bacterial culture of M. tuberculosis
M. tuberculosis strain H37Rv was obtained from the culture collection of the Mycobacteriology Laboratory Branch, Centers for Disease Control and Prevention at Atlanta. A portion of a recently frozen stock was inoculated into 5 ml of complete Middlebrook 7H9 broth (7H9) supplemented with 10% albumin-dextrose-catalase v/v (Difco Laboratories, Detroit, MI) and 0.05% Tween 80 v/v (Sigma, St. Louis, MO) and incubated at 37°C for 5 days. Then the culture was transferred into 50 ml of 7H9 media, incubated at 37°C with 50 rpm shaking, and grown to log phase (0.35 OD600). The cells were harvested by centrifugation for RNA preparation.
Bacterial lysis and RNA isolation were performed following the procedure of  at the CDC lab (Atlanta) during log-phase growth. Briefly, cultures were mixed with an equal volume of RNALater™ (Ambion, Austin, TX) and the bacteria harvested by centrifugation (1 min, 25000g, 8°C) and transferred to Fast Prep tubes (Bio 101, Vista, CA) containing Trizol (Life Technologies, Gaithersburg, MD). Mycobacteria were mechanically disrupted in a Fast Prep apparatus (Bio 101). The aqueous phase was recovered, treated with Cleanascite (CPG, Lincoln Park, NJ), and extracted with chloroform-isoamyl alcohol (24:1 v/v). Nucleic acids were ethanol precipitated. DNaseI (Ambion) treatment to digest contaminating DNA was performed in the presence of Prime RNase inhibitor (5'-3', Boulder, CO). The RNA sample was precipitated and washed in ethanol, and redissolved to make a final concentration of 1 mg/ml. The purity of RNA was estimated by the ratio of the readings at 260 nm and 280 nm (A260/A280) in the UV. 20 ul RNA samples were sent to the UCI DNA core and further checked through a quality and quantity test based on electrophoresis before microarray hybridization.
Microarray hybridization and analysis
In this study, we used the anti-sense Affymetrix M. tuberculosis genome array (GeneChip). The probe selection was based on the genome sequence of M. tuberculosis H37Rv . Each annotated ORF (Open Reading Frame) or IG (Intergenic Region) was interrogated with oligonucleotide probe pairs. The gene chip represented all 3924 ORFs and 738 intergenic regions of H37Rv. Twenty 25-mer probes were selected within each ORF or IG. These probes are called PM (Perfect-Match) probes. The sequence of each PM probe is perturbed with a single substitution at the middle base. They are called MM (Mismatch) probes. A PM probe and its respective MM probe constitute a probe pair. The MM probe serves as a negative control for the PM probe in hybridization.
Microarray hybridization followed the Affymetrix protocol. In brief, the assay utilized reverse transcriptase and random hexamer primers to produce DNA complementary to the RNA. The cDNA products were then fragmented by DNAase I and labeled with terminal transferase and biotinylated GeneChip DNA Labeling Reagent at the 3' terminal.
Each RNA sample was hybridized with one gene array to produce the expression data of all genes on the array. We performed eleven independent bacterial cultures and RNA extractions at different times, and collected eleven sets of microarray data for this study. A global normalization scheme isapplied so that each array's median value is adjusted to a predefine value (500).
The parameter τ controls the sensitivity and specificity of the analysis, and was set to a typical value of 0.015, and the Detection p-value cutoffs, α1 and α2, set to their typical values, 0.04 and 0.06, respectively, according to the Affymetrix system.
In this study, a gene was determined to be always (usually) active if the derived mRNA was present (P-call) in more than 90% (50%) of the RNA samples with a Detection p-value < 0.001. The gene-expression data were further analyzed using Eisen's Cluster and TreeView programs . The whole-genome gene expression map was produced by the hierarchical clustering algorithm based on the average-linkage method in the program with the similarity measure defined by Pearson's correlation coefficient.
This work is supported by National Institutes of Health under the grant HL-080311. We would like to thank CDC for the use of the facilities and thank UCI for providing service for microarray hybridization. Bacterial culture and RNA isolation were performed by Pramod Aryal.
- Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE, Tekaia F, Badcock K, Basham D, Brown D, Chillingworth T, Connor R, Davies R, Devlin K, Feltwell T, Gentles S, Hamlin N, Holroyd S, Hornsby T, Jagels K, Barrell BG: Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature. 1998, 393: 537-544. 10.1038/31159.View ArticlePubMedGoogle Scholar
- Behr MA, Wilson MA, Gill WP, Salamon H, Schoolnik GK, Rane S, Small PM: Comparative genomics of BCG vaccines by whole-genome DNA microarray. Science. 1999, 284: 1520-1523. 10.1126/science.284.5419.1520.View ArticlePubMedGoogle Scholar
- Kato-Maeda M, Rhee JT, Gingeras TR, Salamon H, Drenkow J, Smittipat N, Small PM: Comparing genomes within the species Mycobacterium tuberculosis. Genome Res. 2001, 11: 547-554. 10.1101/gr.166401.PubMed CentralView ArticlePubMedGoogle Scholar
- Troesch A, Nguyen H, Miyada CG, Desvarenne S, Gingeras TR, Kaplan PM, Cros P, Mabilat C: Mycobacterium species identification and rifampin resistance testing with high-density DNA probe arrays. J Clin Microbiol. 1999, 37: 49-55.PubMed CentralPubMedGoogle Scholar
- Gingeras TR, Ghandour G, Wang E, Berno A, Small PM, Drobniewski F, Alland D, Desmond E, Holodniy M, Drenkow J: Simultaneous genotyping and species identification using hybridization pattern recognition analysis of generic Mycobacterium DNA arrays. Genome Res. 1998, 8: 435-448.PubMedGoogle Scholar
- Wilson M, DeRisi J, Kristensen HH, Imboden P, Rane S, Brown PO, Schoolnik GK: Exploring drug-induced alterations in gene expression in Mycobacterium tuberculosis by microarray hybridization. Proc Natl Acad Sci U S A. 1999, 96: 12833-12838. 10.1073/pnas.96.22.12833.PubMed CentralView ArticlePubMedGoogle Scholar
- Sherman DR, Voskuil M, Schnappinger D, Liao R, Harrell MI, Schoolnik GK: Regulation of the Mycobacterium tuberculosis hypoxic response gene encoding alpha -crystallin. Proc Natl Acad Sci U S A. 2001, 98: 7534-7539. 10.1073/pnas.121172498.PubMed CentralView ArticlePubMedGoogle Scholar
- Betts JC, Lukey PT, Robb LC, McAdam RA, Duncan K: Evaluation of a nutrient starvation model of Mycobacterium tuberculosis persistence by gene and protein expression profiling. Mol Microbiol. 2002, 43: 717-731. 10.1046/j.1365-2958.2002.02779.x.View ArticlePubMedGoogle Scholar
- Stewart GR, Wernisch L, Stabler R, Mangan JA, Hinds J, Laing KG, Young DB, Butcher PD: Dissection of the heat-shock response in Mycobacterium tuberculosis using mutants and microarrays. Microbiology. 2002, 148: 3129-3138.View ArticlePubMedGoogle Scholar
- Rachman H, Strong M, Ulrichs T, Grode L, Schuchhardt J, Mollenkopf H, Kosmiadi GA, Eisenberg D, Kaufmann SH: Unique transcriptome signature of Mycobacterium tuberculosis in pulmonary tuberculosis. Infect Immun. 2006, 74: 1233-1242. 10.1128/IAI.74.2.1233-1242.2006.PubMed CentralView ArticlePubMedGoogle Scholar
- Sassetti CM, Boyd DH, Rubin EJ: Genes required for mycobacterial growth defined by high density mutagenesis. Mol Microbiol. 2003, 48: 77-84. 10.1046/j.1365-2958.2003.03425.x.View ArticlePubMedGoogle Scholar
- Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998, 95: 14863-14868. 10.1073/pnas.95.25.14863.PubMed CentralView ArticlePubMedGoogle Scholar
- Shi L, Sohaskey CD, Kana BD, Dawes S, North RJ, Mizrahi V, Gennaro ML: Changes in energy metabolism of Mycobacterium tuberculosis in mouse lung and under in vitro conditions affecting aerobic respiration. Proc Natl Acad Sci U S A. 2005, 102: 15629-15634. 10.1073/pnas.0507850102.PubMed CentralView ArticlePubMedGoogle Scholar
- Kraakman LS, Griffioen G, Zerp S, Groeneveld P, Thevelein JM, Mager WH, Planta RJ: Growth-related expression of ribosomal protein genes in Saccharomyces cerevisiae. Mol Gen Genet. 1993, 239: 196-204.PubMedGoogle Scholar
- Camus JC, Pryor MJ, Medigue C, Cole ST: Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. Microbiology. 2002, 148: 2967-2973.View ArticlePubMedGoogle Scholar
- Sussman M: Molecular Medical Microbiology. 2002, San Diego, Academic Press, 1:View ArticleGoogle Scholar
- Hutter B, Dick T: Molecular genetic characterisation of whiB3, a mycobacterial homologue of a Streptomyces sporulation factor. Res Microbiol. 1999, 150: 295-301. 10.1016/S0923-2508(99)80055-2.View ArticlePubMedGoogle Scholar
- Ramos JL, Martinez-Bueno M, Molina-Henares AJ, Teran W, Watanabe K, Zhang X, Gallegos MT, Brennan R, Tobes R: The TetR family of transcriptional repressors. Microbiol Mol Biol Rev. 2005, 69: 326-356. 10.1128/MMBR.69.2.326-356.2005.PubMed CentralView ArticlePubMedGoogle Scholar
- Tsunedomi R, Izu H, Kawai T, Yamada M: Dual control by regulators, GntH and GntR, of the GntII genes for gluconate metabolism in Escherichia coli. J Mol Microbiol Biotechnol. 2003, 6: 41-56. 10.1159/000073407.View ArticlePubMedGoogle Scholar
- Frota CC, Papavinasasundaram KG, Davis EO, Colston MJ: The AraC family transcriptional regulator Rv1931c plays a role in the virulence of Mycobacterium tuberculosis. Infect Immun. 2004, 72: 5483-5486. 10.1128/IAI.72.9.5483-5486.2004.PubMed CentralView ArticlePubMedGoogle Scholar
- Schena M, Heller RA, Theriault TP, Konrad K, Lachenmeier E, Davis RW: Microarrays: biotechnology's discovery platform for functional genomics. Trends Biotechnol. 1998, 16: 301-306. 10.1016/S0167-7799(98)01219-0.View ArticlePubMedGoogle Scholar
- Hieter P, Boguski M: Functional genomics: it's all how you read it. Science. 1997, 278: 601-602. 10.1126/science.278.5338.601.View ArticlePubMedGoogle Scholar
- Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998, 9: 3273-3297.PubMed CentralView ArticlePubMedGoogle Scholar
- Voskuil MI, Visconti KC, Schoolnik GK: Mycobacterium tuberculosis gene expression during adaptation to stationary phase and low-oxygen dormancy. Tuberculosis (Edinb). 2004, 84: 218-227. 10.1016/j.tube.2004.02.003.View ArticleGoogle Scholar
- Talaat AM, Lyons R, Howard ST, Johnston SA: The temporal expression profile of Mycobacterium tuberculosis infection in mice. Proc Natl Acad Sci U S A. 2004, 101: 4602-4607. 10.1073/pnas.0306023101.PubMed CentralView ArticlePubMedGoogle Scholar
- McKinney JD, Honer zu Bentrup K, Munoz-Elias EJ, Miczak A, Chen B, Chan WT, Swenson D, Sacchettini JC, Jacobs WR, Russell DG: Persistence of Mycobacterium tuberculosis in macrophages and mice requires the glyoxylate shunt enzyme isocitrate lyase. Nature. 2000, 406: 735-738. 10.1038/35021074.View ArticlePubMedGoogle Scholar
- Smith I: Mycobacterium tuberculosis pathogenesis and molecular determinants of virulence. Clin Microbiol Rev. 2003, 16: 463-496. 10.1128/CMR.16.3.463-496.2003.PubMed CentralView ArticlePubMedGoogle Scholar
- Fisher MA, Plikaytis BB, Shinnick TM: Microarray analysis of the Mycobacterium tuberculosis transcriptional response to the acidic conditions found in phagosomes. J Bacteriol. 2002, 184: 4025-4032. 10.1128/JB.184.14.4025-4032.2002.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.