Skip to main content

The gene expression data of Mycobacterium tuberculosis based on Affymetrix gene chips provide insight into regulatory and hypothetical genes



Tuberculosis remains a leading infectious disease with global public health threat. Its control and management have been complicated by multi-drug resistance and latent infection, which prompts scientists to find new and more effective drugs. With the completion of the genome sequence of the etiologic bacterium, Mycobacterium tuberculosis, it is now feasible to search for new drug targets by sieving through a large number of gene products and conduct genome-scale experiments based on microarray technology. However, the full potential of genome-wide microarray analysis in configuring interrelationships among all genes in M. tuberculosis has yet to be realized. To date, it is only possible to assign a function to 52% of proteins predicted in the genome.


We conducted a functional-genomics study using the high-resolution Affymetrix oligonucleotide GeneChip. Approximately one-half of the genes were found to be always expressed, including more than 100 predicted conserved hypotheticals, in the genome of M. tuberculosis during the log phase of in vitro growth. The gene expression profiles were analyzed and visualized through cluster analysis to epitomize the full details of genomic behavior. Broad patterns derived from genome-wide expression experiments in this study have provided insight into the interrelationships among genes in the basic cellular processes of M. tuberculosis.


Our results have confirmed several known gene clusters in energy production, information pathways, and lipid metabolism, and also hinted at potential roles of hypothetical and regulatory proteins.


Knowledge about the genome sequence of Mycobacterium tuberculosis [1] has contributed to recent advancement in understanding the biology of this organism and its clinical relevance. Concurrent with this development, a high-throughput genome-wide gene expression analysis device in the form of microarrays has rapidly emerged as a seemingly indispensable tool for studying genomics in the modern era. These developments have brought about the revolutionary conception of new prophylactic and therapeutic interventions in the genomic perspective. Its significance should be clear, as tuberculosis is still causing millions of deaths in the world.

DNA microarrays have been applied to analyze M. tuberculosis. The first type of application focuses on genotyping, for example, species identification [2, 3] and detection of drug-resistant mutants [4, 5]. The second type of application seeks to explore altered gene expression and understand biological pathways in terms of up-regulated and down-regulated genes in certain conditions of interest, such as drug challenge [6], hypoxia [7], starvation [8], high temperature [9], and in vivo [10]. However, existing applications do not exploit the full potential of genome-scale microarray analysis in configuring interrelationships among all genes in M. tuberculosis. We pioneered the approach that applied the Affymetrix M. tuberculosis GeneChip to gene expression analysis. Previously, this GeneChip was used for applications related to genotyping.

Our study is aimed to explore the whole-genome behavior of M. tuberculosis during log-phase growth by conducting a bioinformatics analysis on genome-wide gene expression data generated from microarray hybridization. Our results enrich the current understanding of genome functions based on sequence analysis and functional studies of individual genes in such aspects as deduction of possible roles of conserved hypothetical and regulatory proteins.


Active genes involved in growth

Research on M. tuberculosis has yet to answer the questions of how many and what genes are active during normal growth in a standard in vitro environment and how they are related to each other in a global genome-wide sense. To answer these questions, we adopted the Affymetrix GeneChip system, which, based on a specific oligonucleotide array format, could provide the absolute signal intensity in a single condition as well as the signal ratio between two conditions. Furthermore, its built-in statistical algorithm computes the so-called Detection p-value that determines the presence or absence of any given mRNA. It is this feature that we capitalize on to explore the genomic behavior of M. tuberculosis. A gene is active when it is expressed. Gene activity is measured by the expression level (i.e., abundance of corresponding mRNA) detected by microarray hybridization in this study.

We found that about one-half of the genes in M. tuberculosis genome participated in bacterial growth during the log (exponential) phase in standard broth culture. The average and standard deviation of expression signal intensity for each active gene would reflect its relative level of activity and the range of variation (Table 1). It was noted that the extent of gene expression did not necessarily correlated with essentiality; for example, among the ten most expressed genes, only two were essential genes as defined by high density mutagenesis [11].

Table 1 The top 100 most expressed genes of M. tuberculosis in log-phase growth. For other active genes, refer to Additional file 4.

Functional genomic analysis

The microarray data of those genes involved in the in vitro growth of M. tuberculosis during log phase were analyzed through the hierarchical clustering algorithm of Eisen's Cluster program [12]. This cluster-analysis program allowed us to explore the internal structure of the data and derive useful information concerning the coordination and collaboration among the genes. A measure fundamental to clustering is that of similarity. We define similarity between genes by their correlation in terms of gene expression patterns across multiple samples. The dendrogram generated by the algorithm (Figure 1) was organized according to this measure and displayed in a way to optimize the similarity of adjacent elements (genes). In the dendrogram, more related genes were joined earlier, and several highly dense clusters with peaks spreading along were visible.

Figure 1
figure 1

The gene expression map of genes involved in the log-phase growth of Mycobacterium tuberculosis. The map was generated by Eisen's cluster-analysis program called CLUSTER and viewed by the TREEVIEW program. Several clusters representing aggregations of functionally related genes are visible in the dendrogram (on the left of the image) showing how genes are grouped. The detailed image is available at our web site [see Additional file 2] annotated with gene names alongside of the image strip [see Additional file 3].

Based upon the above analysis, the dendrogram was partitioned, using a correlation threshold of 0.9, into a number of disjoint clusters related to functional classes known previously. Each cluster contained functionally related genes that might share similar functions or roles in physiology or metabolism and were likely subject to the same regulatory mechanism. For instance, one cluster was found to be similar to the FAS-II operon, and another cluster comprised largely of genes encoding ribosomal proteins (Table 2). Notice, however, genes of different classes may co-exist in the same cluster if they show correlation in their activity. This situation is exemplified by energy-dependent transportation across cell membrane, which necessitates the coupling between the gene class of cell processes and that of energy production.

Table 2

The clustering results were further visualized through Eisen's TreeView program to generate a heat map where the brightness of the red color represented the intensity of gene expression (Figure 1). Based on the cluster analysis results and the observation of four conspicuous shining bands, the map was tentatively divided into four zones from the top to the bottom, called: zones 1, 2, 3, and 4, in the consecutive order. Each zone contained clusters of genes that were strongly correlated in their expression patterns and in that sense, functionally related. Each zone in the map was represented by genes that expressed most (i.e., the brightest in red), Thus, the map zones were labeled according to the functional class ( of their representative genes as "intermediate and lipid metabolism", "energy", "information", and "cell wall, cell processes, and metabolism", respectively.

The two most conspicuous clusters in the first zone of the gene expression map consisted of a cluster represented by genes in the functional category of intermediate metabolism, such as cysH, ggtB, and nirA, and the other cluster represented by genes involved in the FAS-II cycle [6], such as accD 6, kasA, and KasB. The formation of the cluster with emphasis on the FAS-II cycle reflects the importance of this pathway in M. tuberculosis growth. As for intermediate metabolism, M. tuberculosis can metabolize many kinds of carbohydrates, hydrocarbons, alcohols, ketones and carboxylic acids [1]. This zone was adjacent to the energy zone, suggesting the close relationship between intermediate metabolism and energy production.

The most prominent genes in the second zone of the gene expression map were the ATP synthase gene complex, which produces ATP from ADP and is critical in energy metabolism. In addition, genes encoding enzymes involved in the respiratory chain, such as the nuo (NADH-ubiquinone oxidoreductase) gene complex, were clustered into this zone. In another study [13], ATP synthase and nuo gene complexes were found to be down-regulated together. Thus, they may be co-regulated.

The third zone of the gene expression map was represented by genes involved in the information pathways, e.g., dnaQ (DNA polymerase III), recA (recombinase), rpoA (DNA-directed RNA polymerase), and the 30S and 50S ribosome protein gene complexes (rpsB, rpsC, rplB, rplC, etc.). As these genes play a vital role in genetic information replication, transcription, and translation, their expression is essential for maintaining bacterial growth. We noticed that the gene Rv2258C (encoding a possible transcriptional regulatory protein) correlated well with a group of ribosomal protein genes. This result is related to a report that the production of ribosomes is increased through the transcriptional regulation of genes encoding ribosomal proteins during the growth phase of yeast [14].

The last zone of the gene expression map was represented by genes classified under the category of cell wall, cell processes, and metabolism. This category comprises membrane proteins and proteins involved in cell processes, including secreted and transmembrane proteins [15], as well as enzymes involved in intermediate metabolism. Genes and their protein derivatives located in this zone are related to cell wall synthesis, transportation of organic and inorganic substance across membrane, and immunological responses, such as narK 2 (a nitrate/nitrite transporter), fbpC (mycolyl transferase), hsp (a protein induced by heat stress), ald (a secreted enzyme), ctpF (a metal cation transporter), sodA (superoxide dismutase, which destroys radicals), ompA (an outer membrane protein), amt (an ammonium-transport integral membrane protein), furA (a protein for ferric uptake regulation), fbpB (a protein in the antigen-85 complex), and genes in the lipoprotein family (lpqS, lpqH, lppJ, lprF, lpqO and lppD).

The most important feature of mycobacterial cell wall is the substantial amount (up to 60% of the total mass) of lipid components, particularly, the very long chain mycolic acids, which are combined with surface glycolipids to form a pseudolipid bilayer [16]. Since the cell wall synthesis involves lipid metabolism [6], the main cluster in zone 4 also contains some genes in this class, such as fadE26 (encoding acyl-CoA dehydrogenase) and fadD16 (encoding fatty-acid-CoA ligase).

Each zone was tentatively named according to the most expressed genes it contained. The semantics of each zone would further depend on whether each zone represents a concentration of genes in the same functional categories and that concentration is consistent with the function classification of the most expressed genes in that zone. So we further analyzed the percentage distribution of functional classes for the genes in the major clusters of each zone on the gene expression map (Table 3). The functional classification of a gene is based on current knowledge in the field and can be accessed from the TubercuList Server ( Our analysis indicated that each zone contained clusters of genes of similar functions and the most expressed genes were in the dominant function class, suggesting that functional semantics can be assigned to clusters visible from the gene expression map. Zone-1 featured genes in the functional classes of metabolism (intermediate and lipid). In zone-2, there was only a slight dominance of genes in the class of metabolism and respiration over the class of cell wall and processes, but notably, it contained a big chuck of energy-related genes involved in ATP synthesis and electron transport. Zone-3 was characterized by genes in the information category. Zone-4 had the broadest distribution on the map and a wide functional coverage, and was thus least characteristic among the four zones.

Table 3 The distribution (percentage %) of functional categories for genes in the major clusters of each zone on the gene expression map.

There were about a dozen of regulatory genes associated with the major functional clusters identified in this study (Table 4). Their significance is reflected by their possible roles reported in other studies. These genes likely regulate other genes in the same cluster at the transcriptional level and deserve to be examined in more details. Regulatory genes associated with small clusters are considered less important and hence not examined here.

Table 4 Transcriptional regulators associated with major gene clusters based on microarray analysis.

The Rv2989 gene was associated with the cluster characterized by energy metabolism and respiration in our data. The association relation appears to be consistent with reports that this gene is up-regulated at high temperatures [9] and down-regulated after starvation [8].

Rv3334 was up-regulated at high temperatures [9] as well as after starvation [8]. This gene is probably in the MerR (mercury resistance) family and its protein is similar to many regulatory proteins in sequence.

Rv0081 can be induced by hypoxia [7]. Its presence in growing bacterial cells is called into question. However, this gene is only weakly expressed in the present study, and likely to be up-regulated if oxygen is depleted.

The whiB4 gene encodes a protein homologous to a Streptomyces sporulation factor [17], and the gene is up-regulated after starvation [8], suggesting a possible link between starvation and sporulation. In addition, the association of whiB4 with the class of cell wall and cell processes makes sense from the point that sporulation could potentially involve cell membrane.

Rv0653c is probably in the TetR family. Its significance is reflected by the fact that proteins in this family are involved in the transcriptional control of multi-drug efflux pumps and pathogenicity [18]. Another regulatory gene, Rv0165c, is probably in the GntR family. In E. coli, GntR regulates gluconate uptake and catabolism as a repressor [19]. Both Rv0653c and Rv0165c are apparently involved in a membrane-associated cellular process. Another gene worth attention is Rv1931c, which regulates genes important for virulence of M. tuberculosis [20], possibly via a cellular process that translates extracellular stimuli into a transcriptional signal.


The availability of the complete genome sequence of Mycobacterium tuberculosis [1] combined with rapidly emerging microarray technology [21] has catalyzed the process of understanding the bacterial biology and pathogenicity and expedited the development of new diagnostics and therapeutics for tuberculosis. The microarray approach has enabled high-throughput gene expression analysis on a genomic scale in a field known as functional genomics [22]. In particular, elucidation of functional relationships among genes based on genome-wide gene expression data from DNA microarray hybridization has been successfully demonstrated for eukaryotes, notably yeast [12, 23], but it has not been done for M. tuberculosis. To date, the functional classification of genes in M. tuberculosis is mainly based on the biological study of individual genes as well as sequence analysis and comparison with homologous genes in other bacteria. In this study, we provide a comprehensive analysis that addresses this issue from the perspective of functional genomics.

In the application to M. tuberculosis, DNA microarrays have been used for comparing species, detecting drug-resistant mutants, and studying biological behavior under various conditions. In general, there are two computational paradigms for microarray data analysis. The first paradigm is to identify genes differentially expressed across two conditions; the second paradigm is to identify genes expressed in a coordinated manner that share common roles in cellular physiology or metabolism. Most of the applications for M. tuberculosis to date are based on the first paradigm, whereas our study described here is based on the second paradigm.

In this study, active genes were identified by means of the Affymetrix GeneChip. As its unique feature, the Affymetrix system uses multiple oligonucleotide probes for implementing each gene sequence to be interrogated. Furthermore, the system is capable of analyzing the presence or absence of each mRNA. In contrast to the cDNA microarray system that is focused on differential gene expression across two conditions, the Affymetrix system can calculate gene expression in a single condition and compare gene expression across multiple conditions. In this way, the Affymetrix system is more flexible and informative. The flexibility can be attributed to the use of PM/MM probes, instead of two explicitly defined external conditions, for implementing the test/control mechanism in microarray hybridization.

Our method for analysis of in vitro genomic activity of M. tuberculosis can be extended to study functional genomics in vivo. Understanding what genes are switched on or off between in vitro and in vivo conditions would shed light on issues, such as how biological adaptation leads to bacterial latency, why there is discrepancy between laboratory sensitivity and clinical efficacy, and so on. Thus, the functional-genomics data obtained in this work can serve as a reference for interpreting data generated in other contexts. Our genomic analysis was based on multiple RNA samples extracted during log-phase growth in contrast to other coordinated gene expression analyses based on samples collected in a time course or under different conditions. Our experiment design is justified, given the fact that the expression level of any gene has considerable fluctuation from time to time (Table 1) in log-phase growth, as evidenced from the observation that the standard deviation of a gene expression was sometimes greater than 50% of its mean across samples. Variation in gene expression across different time settings enabled the correlation among different genes to be analyzed. The validity of our experiments is supported by reconfirmation of several known growth-related gene clusters. However, our approach is not applicable to samples collected during stationary phase, when little variation in gene expression is expected across samples.

The in vitro broth culture condition has often been used as the reference condition to study the gene expression of M. tuberculosis in other conditions, such as hypoxia and starvation. However, the present study is the first to explore the functional genomics of this organism grown in log-phase culture. Bacterial growth can be divided into four different phases: lag phase, exponential or log phase, stationary phase, and death phase. It is the log phase that we focused our study on. During this phase, high growth activity is evidenced by our data showing that about half of the genes in the genome were expressed. In contrast, many genes in M. tuberculosis are repressed during the stationary phase, a condition similar to but milder than the non-replicating state of tubercle bacilli in an anaerobic condition [24]. In particular, the dormancy regulatory gene, dosR is weakly induced during the stationary phase while strongly induced in an anaerobic non-replicating state. An interesting finding based on our work is that dosR is always moderately expressed even in the log phase, suggesting its possible housekeeping role. Our data further showed that an important gene, acr (hspX), which is induced under hypoxia [7] and starvation [8], was always expressed in the log phase. Global gene expression profiling analysis of M. tuberculosis in mouse [25] and human tissue [10] indicated that lipid metabolism was critical for the bacilli to survive in the host environment. In these conditions, isocitrate lyase (ICL), an enzyme of the glyoxylate shunt (a pathway alternative to the tricarboxylic acid cycle) and related to mycobacterial persistence in macrophages [26], is up-regulated. It is consistent with our finding that icl is weakly expressed or absent in the log-phase culture.

The extent to which gene expression profiles across a set of independently collected samples suffice to separate genes into functional clusters in consistency with prior knowledge is attributable to the rigorous statistical model built in the Affymetrix system. Several familiar gene groups with clear designated functions, such as electron transport, protein synthesis and type II fatty acid synthesis, were observed in the data, offering credence to our analysis. However, genes associated with different functional classes would be placed in the same cluster if they appear to co-express. This implies that genes can share commons roles while differing in their functions, as illustrated by the earlier example of energy-dependent transportation across the cell membrane. In fact, comparing gene clusters based on gene expression with gene classes based on sequence analysis would offer new opportunities for re-defining interrelationships among genes in the genome.

Gene clusters built out of expression profiles can be configured as functional linkage networks among genes but these clusters do not correspond directly to protein networks [10] constructed using a combination of Rosetta stone, phylogenetic profile, conserved gene neighbor, and operon computational methods. Genes, which share similar biological functions, may operate at different stages of the cell cycle or become active under different conditions, and hence have different expression profiles [10]. However, some gene families, in particular those encoding ribosome proteins are closely linked in both gene-expression and protein networks, as seen in our data.

There remained 1051 so-called conserved hypothetical genes/proteins annotated in the genome sequence of M. tuberculosis [15], among which many were found to be active in our analysis, attesting to the predictive power of the genome annotation program [1]. More importantly, this finding suggests that many hypothetical proteins actively participate in the growth process. Unfortunately, the functions of these hypotheticals remain uncharacterized, reflecting the limitation of the sequence-based approach to genome annotation. Microarray-based functional-genomics offers a fundamentally different approach to gene annotation, in which the roles of uncharacterized genes may be hypothesized if they co-express with known genes (Table 5).

Table 5 Conserved hypothetical proteins associated with the gene clusters of each zone on the gene expression map.

Organized according to correlations in gene expression across samples, the gene expression image created by Eisen's Cluster and TreeView programs (Figure 1) enabled us to visualize four transcriptional profiles, which, named according to the functional classes of the dominant genes in that region and put in a linear order over the image, were "intermediate and lipid metabolism", "energy and respiration", "information pathways", and "cell wall, cell processes and metabolism". The dendrogram was constructed and displayed so that similar clusters were likely to be located in proximate nodes. In the present application, as the similarity measure is based on the correlation in gene expression, physical distance on the tree reflects the degree of correlation among gene clusters, even with no guarantee of their optimal linear ordering in the tree [12]. As the expression of the genetic code lies at the heart of all physiological processes and metabolisms, it is logical that the information gene cluster functionally correlates with other gene clusters, a view supported by the observation that it was situated around the center of the image.

As it is now, there are more than 100 transcriptional regulatory genes in M. tuberculosis genome [1, 15]. However, only a fraction of them have been experimentally studied in detail for their functions. A recent survey shows that regulatory proteins account for 9 (20%) out of 45 virulence factors identified in M. tuberculosis [27]. Unraveling the gene regulatory network would allow us to understand both physiological and virulence mechanisms and to develop novel drugs that work at the level of gene regulation. Since elucidating the roles of these genes and their clinical relevance is always time-consuming; it is practically necessary to set up priority for them. A reasonable assumption is that a regulatory gene regulates some other genes in the same functional cluster with a high probability. Under this assumption, we have identified several potentially important transcriptional regulatory genes involved in major biological pathways (Table 4). Their significance has been indicated by analysis based on the literature. Further biological investigation on these genes is warranted in the future work.

All the microarray data and supplementary materials produced in this study are posted at our web site [see Additional file 1].


Genes involved in the in vitro log-phase growth of M. tuberculosis have been identified. The gene expression map (Figure 1) represents broad patters of functional concordance of closely related genes, but more importantly, it summarizes the coordinated cellular activities associated with the growth process on the genomic level. As it is today, hundreds of genes in the genome are annotated as conserved hypotheticals without clearly specified functions. Our data have shown that more than 100 such hypotheticals were actually expressed in the cell medium, and their biological roles can be suggested by their correlation with other known genes. In addition, the roles of most transcriptional regulatory genes predicted in the genome remain to be elucidated. In this study, we have discovered several regulatory genes that may exert regulatory influence on the growth of M. tuberculosis, and their roles may be inferred by what functional clusters they join. The data and information generated here provide an integrated genomic view about gene functions and interrelationships in M. tuberculosis, and can be incorporated in new experiments for research in tuberculosis. This study has not only transcriptionally validated several known gene clusters but also provided insight into a host of unknown hypothetical and regulatory genes.


Bacterial culture of M. tuberculosis

M. tuberculosis strain H37Rv was obtained from the culture collection of the Mycobacteriology Laboratory Branch, Centers for Disease Control and Prevention at Atlanta. A portion of a recently frozen stock was inoculated into 5 ml of complete Middlebrook 7H9 broth (7H9) supplemented with 10% albumin-dextrose-catalase v/v (Difco Laboratories, Detroit, MI) and 0.05% Tween 80 v/v (Sigma, St. Louis, MO) and incubated at 37°C for 5 days. Then the culture was transferred into 50 ml of 7H9 media, incubated at 37°C with 50 rpm shaking, and grown to log phase (0.35 OD600). The cells were harvested by centrifugation for RNA preparation.

RNA isolation

Bacterial lysis and RNA isolation were performed following the procedure of [28] at the CDC lab (Atlanta) during log-phase growth. Briefly, cultures were mixed with an equal volume of RNALater™ (Ambion, Austin, TX) and the bacteria harvested by centrifugation (1 min, 25000g, 8°C) and transferred to Fast Prep tubes (Bio 101, Vista, CA) containing Trizol (Life Technologies, Gaithersburg, MD). Mycobacteria were mechanically disrupted in a Fast Prep apparatus (Bio 101). The aqueous phase was recovered, treated with Cleanascite (CPG, Lincoln Park, NJ), and extracted with chloroform-isoamyl alcohol (24:1 v/v). Nucleic acids were ethanol precipitated. DNaseI (Ambion) treatment to digest contaminating DNA was performed in the presence of Prime RNase inhibitor (5'-3', Boulder, CO). The RNA sample was precipitated and washed in ethanol, and redissolved to make a final concentration of 1 mg/ml. The purity of RNA was estimated by the ratio of the readings at 260 nm and 280 nm (A260/A280) in the UV. 20 ul RNA samples were sent to the UCI DNA core and further checked through a quality and quantity test based on electrophoresis before microarray hybridization.

Microarray hybridization and analysis

In this study, we used the anti-sense Affymetrix M. tuberculosis genome array (GeneChip). The probe selection was based on the genome sequence of M. tuberculosis H37Rv [1]. Each annotated ORF (Open Reading Frame) or IG (Intergenic Region) was interrogated with oligonucleotide probe pairs. The gene chip represented all 3924 ORFs and 738 intergenic regions of H37Rv. Twenty 25-mer probes were selected within each ORF or IG. These probes are called PM (Perfect-Match) probes. The sequence of each PM probe is perturbed with a single substitution at the middle base. They are called MM (Mismatch) probes. A PM probe and its respective MM probe constitute a probe pair. The MM probe serves as a negative control for the PM probe in hybridization.

Microarray hybridization followed the Affymetrix protocol. In brief, the assay utilized reverse transcriptase and random hexamer primers to produce DNA complementary to the RNA. The cDNA products were then fragmented by DNAase I and labeled with terminal transferase and biotinylated GeneChip DNA Labeling Reagent at the 3' terminal.

Each RNA sample was hybridized with one gene array to produce the expression data of all genes on the array. We performed eleven independent bacterial cultures and RNA extractions at different times, and collected eleven sets of microarray data for this study. A global normalization scheme isapplied so that each array's median value is adjusted to a predefine value (500).

Bioinformatics analysis

The gene expression data were analyzed by the program GCOS (GeneChip Operating Software) version 1.4. In the program, the Detection algorithm determines whether a measured transcript is detected (P Call) or not detected (A Call) on a single array according to the Detection p-value that is computed by applying the one-sided Wilcoxon's signed rank test to test the Discrimination scores (R) against a predefined adjustable threshold τ. The Discrimination score calculated for each probe pair is a function of the PM intensity (PMI) and the MM intensity (MMI), as given by

The parameter τ controls the sensitivity and specificity of the analysis, and was set to a typical value of 0.015, and the Detection p-value cutoffs, α1 and α2, set to their typical values, 0.04 and 0.06, respectively, according to the Affymetrix system.

In this study, a gene was determined to be always (usually) active if the derived mRNA was present (P-call) in more than 90% (50%) of the RNA samples with a Detection p-value < 0.001. The gene-expression data were further analyzed using Eisen's Cluster and TreeView programs [12]. The whole-genome gene expression map was produced by the hierarchical clustering algorithm based on the average-linkage method in the program with the similarity measure defined by Pearson's correlation coefficient.


  1. Cole ST, Brosch R, Parkhill J, Garnier T, Churcher C, Harris D, Gordon SV, Eiglmeier K, Gas S, Barry CE, Tekaia F, Badcock K, Basham D, Brown D, Chillingworth T, Connor R, Davies R, Devlin K, Feltwell T, Gentles S, Hamlin N, Holroyd S, Hornsby T, Jagels K, Barrell BG: Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence. Nature. 1998, 393: 537-544. 10.1038/31159.

    Article  CAS  PubMed  Google Scholar 

  2. Behr MA, Wilson MA, Gill WP, Salamon H, Schoolnik GK, Rane S, Small PM: Comparative genomics of BCG vaccines by whole-genome DNA microarray. Science. 1999, 284: 1520-1523. 10.1126/science.284.5419.1520.

    Article  CAS  PubMed  Google Scholar 

  3. Kato-Maeda M, Rhee JT, Gingeras TR, Salamon H, Drenkow J, Smittipat N, Small PM: Comparing genomes within the species Mycobacterium tuberculosis. Genome Res. 2001, 11: 547-554. 10.1101/gr.166401.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  4. Troesch A, Nguyen H, Miyada CG, Desvarenne S, Gingeras TR, Kaplan PM, Cros P, Mabilat C: Mycobacterium species identification and rifampin resistance testing with high-density DNA probe arrays. J Clin Microbiol. 1999, 37: 49-55.

    PubMed Central  CAS  PubMed  Google Scholar 

  5. Gingeras TR, Ghandour G, Wang E, Berno A, Small PM, Drobniewski F, Alland D, Desmond E, Holodniy M, Drenkow J: Simultaneous genotyping and species identification using hybridization pattern recognition analysis of generic Mycobacterium DNA arrays. Genome Res. 1998, 8: 435-448.

    CAS  PubMed  Google Scholar 

  6. Wilson M, DeRisi J, Kristensen HH, Imboden P, Rane S, Brown PO, Schoolnik GK: Exploring drug-induced alterations in gene expression in Mycobacterium tuberculosis by microarray hybridization. Proc Natl Acad Sci U S A. 1999, 96: 12833-12838. 10.1073/pnas.96.22.12833.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  7. Sherman DR, Voskuil M, Schnappinger D, Liao R, Harrell MI, Schoolnik GK: Regulation of the Mycobacterium tuberculosis hypoxic response gene encoding alpha -crystallin. Proc Natl Acad Sci U S A. 2001, 98: 7534-7539. 10.1073/pnas.121172498.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  8. Betts JC, Lukey PT, Robb LC, McAdam RA, Duncan K: Evaluation of a nutrient starvation model of Mycobacterium tuberculosis persistence by gene and protein expression profiling. Mol Microbiol. 2002, 43: 717-731. 10.1046/j.1365-2958.2002.02779.x.

    Article  CAS  PubMed  Google Scholar 

  9. Stewart GR, Wernisch L, Stabler R, Mangan JA, Hinds J, Laing KG, Young DB, Butcher PD: Dissection of the heat-shock response in Mycobacterium tuberculosis using mutants and microarrays. Microbiology. 2002, 148: 3129-3138.

    Article  CAS  PubMed  Google Scholar 

  10. Rachman H, Strong M, Ulrichs T, Grode L, Schuchhardt J, Mollenkopf H, Kosmiadi GA, Eisenberg D, Kaufmann SH: Unique transcriptome signature of Mycobacterium tuberculosis in pulmonary tuberculosis. Infect Immun. 2006, 74: 1233-1242. 10.1128/IAI.74.2.1233-1242.2006.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  11. Sassetti CM, Boyd DH, Rubin EJ: Genes required for mycobacterial growth defined by high density mutagenesis. Mol Microbiol. 2003, 48: 77-84. 10.1046/j.1365-2958.2003.03425.x.

    Article  CAS  PubMed  Google Scholar 

  12. Eisen MB, Spellman PT, Brown PO, Botstein D: Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998, 95: 14863-14868. 10.1073/pnas.95.25.14863.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  13. Shi L, Sohaskey CD, Kana BD, Dawes S, North RJ, Mizrahi V, Gennaro ML: Changes in energy metabolism of Mycobacterium tuberculosis in mouse lung and under in vitro conditions affecting aerobic respiration. Proc Natl Acad Sci U S A. 2005, 102: 15629-15634. 10.1073/pnas.0507850102.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  14. Kraakman LS, Griffioen G, Zerp S, Groeneveld P, Thevelein JM, Mager WH, Planta RJ: Growth-related expression of ribosomal protein genes in Saccharomyces cerevisiae. Mol Gen Genet. 1993, 239: 196-204.

    CAS  PubMed  Google Scholar 

  15. Camus JC, Pryor MJ, Medigue C, Cole ST: Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. Microbiology. 2002, 148: 2967-2973.

    Article  CAS  PubMed  Google Scholar 

  16. Sussman M: Molecular Medical Microbiology. 2002, San Diego, Academic Press, 1:

    Chapter  Google Scholar 

  17. Hutter B, Dick T: Molecular genetic characterisation of whiB3, a mycobacterial homologue of a Streptomyces sporulation factor. Res Microbiol. 1999, 150: 295-301. 10.1016/S0923-2508(99)80055-2.

    Article  CAS  PubMed  Google Scholar 

  18. Ramos JL, Martinez-Bueno M, Molina-Henares AJ, Teran W, Watanabe K, Zhang X, Gallegos MT, Brennan R, Tobes R: The TetR family of transcriptional repressors. Microbiol Mol Biol Rev. 2005, 69: 326-356. 10.1128/MMBR.69.2.326-356.2005.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  19. Tsunedomi R, Izu H, Kawai T, Yamada M: Dual control by regulators, GntH and GntR, of the GntII genes for gluconate metabolism in Escherichia coli. J Mol Microbiol Biotechnol. 2003, 6: 41-56. 10.1159/000073407.

    Article  CAS  PubMed  Google Scholar 

  20. Frota CC, Papavinasasundaram KG, Davis EO, Colston MJ: The AraC family transcriptional regulator Rv1931c plays a role in the virulence of Mycobacterium tuberculosis. Infect Immun. 2004, 72: 5483-5486. 10.1128/IAI.72.9.5483-5486.2004.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  21. Schena M, Heller RA, Theriault TP, Konrad K, Lachenmeier E, Davis RW: Microarrays: biotechnology's discovery platform for functional genomics. Trends Biotechnol. 1998, 16: 301-306. 10.1016/S0167-7799(98)01219-0.

    Article  CAS  PubMed  Google Scholar 

  22. Hieter P, Boguski M: Functional genomics: it's all how you read it. Science. 1997, 278: 601-602. 10.1126/science.278.5338.601.

    Article  CAS  PubMed  Google Scholar 

  23. Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998, 9: 3273-3297.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  24. Voskuil MI, Visconti KC, Schoolnik GK: Mycobacterium tuberculosis gene expression during adaptation to stationary phase and low-oxygen dormancy. Tuberculosis (Edinb). 2004, 84: 218-227. 10.1016/

    Article  CAS  Google Scholar 

  25. Talaat AM, Lyons R, Howard ST, Johnston SA: The temporal expression profile of Mycobacterium tuberculosis infection in mice. Proc Natl Acad Sci U S A. 2004, 101: 4602-4607. 10.1073/pnas.0306023101.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  26. McKinney JD, Honer zu Bentrup K, Munoz-Elias EJ, Miczak A, Chen B, Chan WT, Swenson D, Sacchettini JC, Jacobs WR, Russell DG: Persistence of Mycobacterium tuberculosis in macrophages and mice requires the glyoxylate shunt enzyme isocitrate lyase. Nature. 2000, 406: 735-738. 10.1038/35021074.

    Article  CAS  PubMed  Google Scholar 

  27. Smith I: Mycobacterium tuberculosis pathogenesis and molecular determinants of virulence. Clin Microbiol Rev. 2003, 16: 463-496. 10.1128/CMR.16.3.463-496.2003.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  28. Fisher MA, Plikaytis BB, Shinnick TM: Microarray analysis of the Mycobacterium tuberculosis transcriptional response to the acidic conditions found in phagosomes. J Bacteriol. 2002, 184: 4025-4032. 10.1128/JB.184.14.4025-4032.2002.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references


This work is supported by National Institutes of Health under the grant HL-080311. We would like to thank CDC for the use of the facilities and thank UCI for providing service for microarray hybridization. Bacterial culture and RNA isolation were performed by Pramod Aryal.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Li M Fu.

Additional information

Authors' contributions

L. Fu developed the method, conducted the experiments, and performed data analysis. C. Fu-Liu interpreted the data. Both authors drafted, read, and approved the manuscript.

Electronic supplementary material


Additional File 1: Microarray data. The microarray data and supplementary materials produced in this study. (HTML 3 KB)


Additional File 2: Gene expression map (detailed full image). M. tuberculosis during log-phase growth. (GIF 838 KB)


Additional File 3: Gene names alongside the gene expression map. in the order as they appear in the map. (TXT 34 KB)


Additional File 4: Active genes. M. tuberculosis during log-phase growth. (HTML 3 KB)


Additional File 5: Gene zones in the gene expression map. M. tuberculosis during log-phase growth. (TXT 19 KB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Fu, L.M., Fu-Liu, C.S. The gene expression data of Mycobacterium tuberculosis based on Affymetrix gene chips provide insight into regulatory and hypothetical genes. BMC Microbiol 7, 37 (2007).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: