Genomic patterns of pathogen evolution revealed by comparison of Burkholderia pseudomallei, the causative agent of melioidosis, to avirulent Burkholderia thailandensis
© Yu et al; licensee BioMed Central Ltd. 2006
Received: 17 March 2006
Accepted: 26 May 2006
Published: 26 May 2006
The Gram-negative bacterium Burkholderia pseudomallei (Bp) is the causative agent of the human disease melioidosis. To understand the evolutionary mechanisms contributing to Bp virulence, we performed a comparative genomic analysis of Bp K96243 and B. thailandensis (Bt) E264, a closely related but avirulent relative.
We found the Bp and Bt genomes to be broadly similar, comprising two highly syntenic chromosomes with comparable numbers of coding regions (CDs), protein family distributions, and horizontally acquired genomic islands, which we experimentally validated to be differentially present in multiple Bt isolates. By examining species-specific genomic regions, we derived molecular explanations for previously-known metabolic differences, discovered potentially new ones, and found that the acquisition of a capsular polysaccharide gene cluster in Bp, a key virulence component, is likely to have occurred non-randomly via replacement of an ancestral polysaccharide cluster. Virulence related genes, in particular members of the Type III secretion needle complex, were collectively more divergent between Bp and Bt compared to the rest of the genome, possibly contributing towards the ability of Bp to infect mammalian hosts. An analysis of pseudogenes between the two species revealed that protein inactivation events were significantly biased towards membrane-associated proteins in Bt and transcription factors in Bp.
Our results suggest that a limited number of horizontal-acquisition events, coupled with the fine-scale functional modulation of existing proteins, are likely to be the major drivers underlying Bp virulence. The extensive genomic similarity between Bp and Bt suggests that, in some cases, Bt could be used as a possible model system for studying certain aspects of Bp behavior.
Identifying the molecular mechanisms and pathways responsible for generating and regulating pathogen virulence is a key challenge of infectious diseases research. Besides increasing our basic understanding of pathogen behavior, such information is also essential for many clinically-relevant areas including the acquisition of drug resistance, vaccine design, and the emergence of new diseases [1, 2]. With the increasing availability of complete genome sequence data from multiple microbial pathogens, comparative genomics has recently emerged as a powerful tool to understand the basic molecular properties of pathogens. A particularly useful analysis in this regard has been to compare the genomes of a virulent species to a closely related but avirulent family member. Such studies have revealed the fundamental importance of horizontal acquisition, gene mutation, genome rearrangements, and bacteriophage mediated recombination in the development of virulence [3–5]. However, of the >60 microbial pathogens sequenced to date, less than a quarter (<15) have been compared in this manner. To achieve a broad understanding of the general mechanisms governing the evolution of pathogens, it is likely that such comparative analysis will be required for several more pathogenic species.
The Gram-negative pathogen B. pseudomallei (Bp) is the causative agent of melioidosis, a serious, often fatal disease of both humans and animals [6, 7]. Endemic to many parts of South East Asia and Northern Australia, Bp is considered a major tropical pathogen  and Category B biowarfare agent . In many countries, Bp can only be experimentally manipulated under biosafety level 3 (BSL3) conditions (ABSA, USA). In contrast to Bp, the related species B. thailandensis (Bt) is nonpathogenic for humans and animals although it displays several phenotypic characteristics similar to Bp. Indeed, by routine diagnostic tests, it is often difficult to distinguish the two . Like Bp, Bt is also a soil saprophyte, and until its classification as a distinct species in 1998 was considered to represent a subtype of Bp [11, 12]. Recently, the genome sequence of one Bt strain, E264, was reported and compared to Bp and B. mallei, another related species . However, in that work, most of the comparative analysis was confined to a set of 716 genes transcriptionally regulated in B. mallei upon mouse infection, representing less than 15% of all genes in Bp. The identification of Bt-specific genetic elements was also not addressed in that report, which might also prove important for understanding virulence, as recent reports have suggested that gene loss can also contribute to pathogenesis . To definitely address the roles of horizontal transfer, gene mutation, and transcriptional plasticity on the development of Bp virulence, a comprehensive genome-wide comparison of Bp and Bt is required.
In this work, we report the draft genome sequence of Bt strain ATCC700388, corresponding to the same strain as Bt E264. By comparing the ATCC700388 draft sequence against the finished Bt E264 genome, we corrected certain sequencing errors in the reported Bt E264 genome, and compared the refined Bt strain E264 sequence to the genome of Bp K96243 . We find that Bp and Bt are broadly similar at many genomic levels – for example, both species display highly conserved genomic synteny, and appear to share an extensive repertoire of genes involved in core metabolism, accessory pathways, structure-based superfamilies and bacterial virulence factors. Despite these similarities, our analysis also revealed that in comparison to the rest of the genome, virulence-related genes in Bp appear to have undergone accelerated change, perhaps to better adapt to the challenge of infecting and surviving in a human or animal host. We also defined a series of key large-scale differences between Bp and Bt contributing to novel metabolic differences between the two species, and others that may be critically required for virulence. Our results raise several testable hypotheses regarding key virulence components in Bp, and enhance our general understanding of pathogen evolution. Finally, the broad similarities between Bp and Bt also raise the possibility that Bt could prove useful as a potential model organism to study certain aspects of Bp biology.
Bt is closely related to Bp by 16S phylogeny
The Burkholderia genus is a large bacterial family containing >30 distinct species . We estimated approximate divergence times between Bt and other species in the Burkholderia genus by comparing on a phylogenetic basis 16S rRNA gene sequences from four Burkholderia species and three outgroup species (Pseudomonas aeruginosa LMG1242T, Escherichia coli K12 and Salmonella typhimurium LT2). From this analysis, we found that Bt occupies a branch in the 16S rRNA phylogenetic tree that is highly related, but not identical, to B. pseudomallei (Bp) and B. mallei (supplementary figure one in Additional file 1). The close phylogenetic similarity between Bp and B. mallei is expected given that recent genome analysis suggests that B. mallei is in fact a derivative or clone of Bp . A calibration of the phylogenetic tree against the E. coli and S. typhimurium divergence, estimated at 140 million years (Mya) , suggests that Bt diverged from Bp and B. mallei approximately 47 Mya ago. Similar time-frames were also obtained using groEL, another highly conserved housekeeping gene (Y. Yu, data not shown). This analysis suggests that although Bt is avirulent, it is likely to be highly evolutionarily related to virulent Bp, and a good candidate for comparative genomic analaysis. Notably, the Bp/Bt separation times are comparable to that used in other studies comparing pathogenic and non-pathogenic species [19, 20].
Genome sequencing of Bt ATCC700388 and comparison to Bt E264
Genomic DNA from Bt ATCC700388 was sequenced by a shotgun approach to 7x genome coverage, and the computer program ARACHNE was then employed to assemble the shotgun reads into contigs and scaffolds. The median contig length was 40 Kb (86 contigs in total), and 3 scaffolds could be assembled from the contigs, comprising 3.4, 2.9, and 0.35 Mb respectively. A subsequent comparison of these scaffolds to the finished Bt E264 sequence allowed the 1st and 3rd scaffold to be collapsed into a single assembly (Chromosome 1). The amount of genome sequence contained in contigs (sum of contig lengths) was 6.66 Mb (Chr 1 – 3.78 Mb, Chr 2 – 2.88 Mb), compared with a total genome length (sum of scaffold lengths, including estimated gaps between contigs) of 6.7 Mb (Chr 1 – 3.80 Mb, Chr 2 – 2.9 Mb). Thus, approximately 99% of the Bt ATCC700388 genome is contained in contigs, with 44 sequence gaps (29 in Chr 1, 15 in Chr 2), of which 15 gaps are larger than 500 bp (largest gap size being 9995 bp).
Although Bt ATCC700388 and Bt E264 represent the same bacterial strain, the specific isolates chosen for sequencing were stored at independent centers (DMERI and USAMRIID), and Bt ATCC700388 exhibits a slightly reduced growth rate compared to Bt E264 (Additional file 2). An analysis of the E264 and ATCC700388 genome sequences revealed that they were near identical except for : I) a chromosomal inversion of 2 Mbp on Chr 1, from position 12442442 (BTH_I1099) to 3328461 (BTH_I2895) based on the E264 genome sequence, II) 4 genes that are absent in ATCC70388, and III) 80 genes with putative protein sequence-altering DNA polymorphisms between the two strains (a full list of these genes is presented in Additional file 3). An initial resequencing analysis of genes in category III) revealed that several of the putative DNA 'polymorphisms' were sequencing artifacts, and thus an extensive analysis of genes in this category was not pursued further (Additional file 3). Of the four ATCC70388 absent genes, two genes (Bth_I1484 and Bth_I1485) were experimentally validated by PCR (Chua HH, data not shown) and encode components of a Type II Oligo-polysaccharide (OPS) synthesis gene cluster . It is possible that the lack of these genes in ATCC700388 may contribute towards the observed growth differences by affecting Bt cell wall and membrane biogenesis. Although this hypothesis is supported by studies of the Bth_I1484 homolog in B. subtilis (TagO, 22), we emphasize that it is still highly speculative and needs to be further investigated. As the genomes of both strains (ATCC700388 and E264) were largely similar, we decided to adopt the finished E264 genome, after incorporating corrections for a few sequencing errors, as a reference Bt genome to compare against Bp.
Genome features of Bt
Genome features of Bt and Bp
No. of genes
Average gene length (nt)
%G + C content
No. of Specific genes
No. of conserved genes
Genomic synteny and large-scale differences between the Bp and Bt genomes
Five examples of genomic difference from three categories: Bp-specific; Bt-specific and Bp-Bt divergent. Genes associated with genomic islands are not included in this table
Bp-Bt divergent genes
Functional notes of Bt gene
Functional notes of Bp gene
biosynthesis; lipid transport and metabolism genes
including lipopolysaccharides biosynthesis genes; signal transduction mechanisms genes; etc.
including Energy production and conversion genes; transcriptional regulators etc.
including Secondary metabolites biosynthesis; energy production and conversion genes
peptide synthetase (NRPS) cluster
Type III systems 1
RND multi-drug efflux
The Bp and Bt genome share similar protein family distributions
The SUPERFAMILY hidden Markov model library  is a database representing all proteins of known structure derived using Structural Classification of Proteins (SCOP) defined structural folds. We used SUPERFAMILY analysis to classify and compare evolutionarily related groups of domains between the two organisms. We queried the predicted proteomes of Bp and Bt against the SUPERFAMILY HMM library and obtained structural (and hence implied functional) assignments to protein sequences at the superfamily level for both species. This exercise produced 3706 Bt sequences with 721 unique SCOP superfamilies and 3645 Bp sequences with 705 unique SCOP superfamilies. The top 20 structural families assigned to Bp and Bt protein sequences are provided in supplementary table five (see Additional file 1). We found the distribution of different structural family assignments highly similar between Bp and Bt, supporting our basic conclusion that genomes of Bt and Bp are highly conserved. The most populated superfamily in both organisms were P-loop containing nucleoside triphosphate hydrolases (~7.7%), with a local sequence motif ([AG]-x(4)-G-K-[ST]) for ATP/GTP binding. The second and third most distributed superfamilies are winged helix DNA-binding domain (~6%) and NAD(P)-binding Rossmann-fold domains (~5%), which are involved in DNA-binding and NAD(P)-binding respectively.
Virulence genes in Bp and Bt exhibit increased diversity
Besides the acquisition of large-scale genomic material, alterations in the amino acid composition of protein homologs could also play an important role in the phenotypic differences of Bp and Bt, by altering protein activities and specificities. To perform a systematic analysis of the Bp and Bt proteomes, we identified and compared 4630 orthologous protein pairs (2826 from Chromosome 1 and 1804 from Chromosome 2) between the two genomes, and mapped them to them to their respective metabolic pathways. We found strong conservation between the Bp and Bt proteomes in core metabolic pathways such as amino acid metabolism, cofactor and carrier synthesis, nucleotide and protein biosynthesis, consistent with the ability of Bp and Bt to occupy similar environmental niches . Based upon KEGG annotations, only five out of 1997 genes involved in core metabolic pathways in Bp do not have a clear homolog in Bt, including a putative ATP-binding protein (BPSL2860), a putative acetyltransferase (BPSL1417), a putative RNA 2'-phosphotransferase (BPSL0762) and two ABC transporters (BPSL1824, BPSL2849). This result suggests that the central metabolic machineries utilized by Bp and Bt are likely to be highly similar. Unexpectedly, the Bp and Bt proteomes also appear to share significant similarities in their virulence components as well. Specifically, of 368 known and potential virulence genes in Bp , 275 orthologs (71%) are present in Bt at an average similarity of greater than 80% (see supplementary table four in Additional file 1), including two type three secretion systems (TTS2 and TTS3), antibiotic resistance genes, type IV pili-generating proteins, hemolysin-related genes, and several adhesion factors and proteases. Of the remaining 93 Bp-specific virulence-related genes, 20 are located in Bp GIs or regions of atypical GC content, and 73 within the core Bp genome.
Functional biases in Bp and Bt pseudogenes
Around one third of the pseudogenes in both species corresponded to hypothetical proteins and/or proteins of unknown function. However, among the annotated pseudogenes in the Bt genome, there was a distinct non-random functional trend for proteins exhibiting such inactivation events, as they largely involved membrane-associated and exported proteins (26 Bt pseugodgenes, p < 0.001 by Fisher's Test). This bias may reflect the results of strong selective pressures in the natural environments directly contacted by these bacteria. Intriguingly, in the Bp genome, one frequently involved class of proteins that were subject to inactivation events were genes involved in transcription (21 from Bp; p < 0.001), which could lead to differences in activating upstream stimuli and the selection of target genes. This result suggests that certain aspects of Bp-specific behavior may result from alterations in gene expression, in addition to the other large-scale and protein differences previously discussed.
Comparison of Bp and Bt intergenic regions
Finally, our finding that a number of transcriptional regulators might be targeted for inactivation in Bp made us also investigate the extent to which the transcriptional regulation machinery in Bp and Bt might be similar or different. As a surrogate for gene expression information, we compared intergenic regions of the orthologous protein pairs, which are likely to contain important cis-acting promoter elements and motifs for transcriptional regulator docking and activity. Using a working criterion that an intergenic region should be greater than 30 bp and lie between two distinct ORFs, we identified 1191 (~58%) orthologous intergenic regions in Chromosome 1 and 634 (~49%) in Chromosome 2 for further analysis. When assessed at a global level, the intergenic regions of Bp and Bt appear to be similar, in terms of average sequence identity (93% for integenic pairs in both chromosomes), mean lengths, and size ranges (p > 0.05). The high levels of similarity between the intergenic regions could mean that either the transcriptional regulatory machineries of Bp and Bt are highly similar, or that insufficient time has passed for conserved promoter motifs to emerge against the background mutation rate. To test the genomes for any evidence of increased conservation in the intergenic regions, we asked if there was any relationship between the extent of protein similarities between homologs and the degree of sequence conservation in their 5' intergenic regions. Comparing the orthologous gene pairs, we identified a weak but significant positive correlation between protein similarity and adjacent 5' intergenic sequences for genes associated with core metabolism (r = 0.134, P < 0.01, Spearman's correlation coefficient). This bias became stronger particularly for Chromosome 1 when it was treated as separate from Chromosome 2 (r = 0.147, P < 0.0001 for Chromosome 1; r = 0.084, P < 0.05 for Chromosome 2, Spearman's correlation coefficient). In addition, the percentage similarity for promoters of core genes was significantly higher compared to the genome average (p < 0.001, Figure 6b). The finding that orthologs involved in core metabolism already tend to have more conserved 5' intergenic region suggests that some degree of selection in the integenic regions has already taken place. In contrast, for the virulence genes, such a relationship was not observed (p > 0.05). Thus, for genes that are present in both species, the high levels of similarity between their intergenic regions suggest that the cis-acting promoter elements of Bp and Bt, are likely to be broadly similar.
Genomic comparisons between pathogens and related nonpathogenic relatives have played an important role in identifying the mechanisms responsible for acquisition of virulence in the natural environment [3, 36–39]. Through these analyses, a general picture is now emerging where different pathogens appear to have employed slightly different evolutionary mechanisms to develop virulence. For example, genomic comparisons between pathogenic and laboratory strains of E. coli have revealed evidence of a common core chromosome interrupted by the horizontal introduction of multiple segments of virulence related genes (pathogenicity islands) [37, 40]. In contrast, the loss of ancestral genomic DNA may play an important role for generating virulence in Listeria and Bordetella species [38, 41]. In the case of Yersinia pestis, it has also been proposed that extensive chromosomal rearrangements and massive gene inactivation can also act as a driving force for pathogen evolution .
In the case of Bp, our comparative analysis indicates gene mutation, gene deletion, and gene acquisition are likely to represent the major evolutionary drivers of Bp virulence, and that other proposed mechanisms of pathogen evolution, including chromosomal rearragement and bacteriophage-mediated recombination [5, 40] may thus a less relevant role in the pathogenic evolution of Bp. Our results are broadly consistent and support the findings of a previous study where genes found to be transcriptionally regulated in B. mallei upon infection were compared to their counterparts in Bp and Bt . In that study, the investigators found that the three organisms all possessed the same genome structure of two chromosomes and high levels of conserved nucleotide identity. However, down-regulated genes, which were related to cell growth, were more well conserved while up-regulated potential virulence encoding genes were less well conserved or absent in Bt. Besides confirming these findings on a genome-wide scale, our study also possesses a number of novel features. Specifically, these include I) the discovery and validation of GIs in the Bt genome as genomic elements of lateral transfer, II) that unlike B. mallei, the Bp and Bt genomes are highly syntenic, III) the increased divergence of virulence genes, especially those associated with Type III secretion, between Bp and Bt, IV) functional biases in inactivated genes for membrane-associated proteins in Bt and transcription factors in Bp, V) effects of species-specific genes on metabolism and virulence, and VI) evidence that the cis-transcriptional regulatory machineries of Bp and Bt are likely to be broadly similar.
To develop as a successful pathogen, virulent bacteria need to evolve both offensive (eg. adherence, invasion, toxin, secretion systems) and defensive pathways (eg. antiphagocytosis, anti-proteolysis, phase variation, serum resistance). Bp and Bt share a large proportion of both offensive and defensive virulence factors (~71%), including adhesion factors, type IV pili, and two Type III secretion systems (TTS2 and TTS3). However, when treated collectively, these virulence genes appear to be significantly more divergent between Bp and Bt compared to the core metabolic genes or the rest of the genome. In contrast to this proteomic comparison, our analysis of the promoters of these genes failed to demonstrate an increased rate of divergence in cis-acting loci that might affect the transcriptional regulation of these genes. This result is unlikely to be caused by a lack of sensitivity in our comparison, as we were able to detect a significantly increased rate of conservation in the promoters of genes associated with core metabolism. Thus, at present, we favor the possibility that the cis-acting loci responsible for the regulation of these genes are likely to be fairly similar between Bp and Bt. However, we note that a different scenario may pertain to the trans-acting loci, since a significant enrichment of transcription factors appear to have been mutationally inactivated or altered in Bp. A close comparison of the transcriptomes of Bp and Bt, which is currently underway in our laboratory, should prove valuable in addressing this issue.
Our results also support key roles for large-scale gene loss, acqusition, and replacement in the development of Bp virulence. For example, both Bp and Bt share the TTS3 Type III secretion system, which is required for the full virulence of Bp in a hamster model of infection . However, it has been recently shown that arabinose exposure may downregulate TTS3 expression and activity . The absence of an arabinose assimilation operon in Bp might thus have contributed to the increased virulence of this species. Besides gene deletion, several gene clusters related to fimbriae and capsular polysaccharides synthesis have also been horizontally transferred to Bp, potentially contributing to the variation of surface components between the two organisms. It has long been recognized that bacterial surface components can play an indispensable role in the pathogenesis of infectious disease, and surface components of Bp may serve as virulence factors by playing a role in the attachment of bacteria to the host cell surface [43, 44]. One striking result from our analysis was our discovery that the polysaccharide capsule gene cluster, which has been shown to be an essential virulence determinant , is likely to be non-randomly transferred into the Bp genome by replacing a pre-existing gene cluster in Bt already dedicated to polysaccharide synthesis. Polysaccharide coats play an important role in bacterial survival and persistence in the environment and evasion of host immune response, but may not constitute offensive attack. The fact that Bp has higher invasion, adherence capacity, and resistance to phagocytosis are thought to be related to the ability of Bp to produce exopolysaccharides [28, 45].
We conclude this report by noting that B. pseudomallei is listed as a category B agent on the Centers for Disease Control Bioterrorism Agents/Diseases list , and experimental manipulation of Bp is mandated by law in several countries to be conducted under biosafety level 3 (BSL3) laboratory requirements. As many centers in Bp-endemic areas do not have BSL3 facilities, this requirement has somewhat hampered the progress of research in basic Bp biology. By contrast, Bt is considered a risk group 1 agent, and although not considered clinically significant, Bt has been shown to be lethal to the model system C. elegans . Thus, although it is undoubtedly essential to be cautious when extrapolating findings from one species to another, the high degree of similarity between the Bt and Bp genomes raises the possibility that Bt could be used as a model system for studying certain aspects of Bp biology, similar to B. cereus and B. anthracis. The availability of an easily tractable experimental organism, which can be manipulated under standard laboratory conditions, could thus prove useful in accelerating research in the pathogenesis of melioidosis.
A comparative genomic analysis of Btto Bp has revealed that their molecular inventories are highly similar, in terms of genome structure, gene order, and functional content. Bt contains at least fifteen genomic islands that are variably present across different Bt isolates, whichmay contribute to the presence of different strain-specific phenotypes. Our results suggest that a limited number of horizontal-acquisition events, coupled with the fine-scale functional modulation of existing proteins, are likely to be the major drivers underlying Bp virulence. The extensive genomic similarity between Bp and Bt suggests that, in some cases, Bt could be used as a possible model system for studying certain aspects of Bp behavior.
16S ribosomal sequences for Burkholderia (B. pseudomallei K96243, B. thailandensis E264, B. mallei ATCC23343 and B. cepacia J2315), and other spp. (Pseudomonas. aeruginosa LMG1242T, Escherichia coli K12 and Salmonella typhimurium LT2) were aligned and compared using MEGA version 3.0 software . Sequence divergence rates were calculated using a neighbor joining algorithm and a bootstrap value of 2000. P. aeruginosa LMG1242T was used as an outlier subgroup.
Bacterial strains and genome sequencing
Two independent cultures of B. thailandensis E264, derived from the same original isolate but stored at independent centers (DMERI and USAMRIID), were processed for genomic DNA extraction and whole-genome shotgun (WGS) sequencing at separate facilities (The Institute for Genomic Research (TIGR) and the Broad Institute (BI)). At TIGR, sequencing and assembly followed by genome closure was performed as described in Nierman et al. . Identification of B. thailandensis coding sequences (CDs) was performed using GLIMMER (48) with modifications described in Nierman et al. . The BI sequencing approach is highly similar and described in the Main Text and Additional file 4. Our comparative genomic analysis is primarily based on the TIGR E264 genome after incorporating corrections for specific sequencing errors. The TIGR Bt E264 genome sequence has been assigned GenBank accession no. CP000086 and CP0000865. The Whole Genome Shotgun project described in this paper has been deposited at DDBJ/EMBL/GenBank under the project accession AACX00000000. The version described in this paper is the first version, AACX01000000.
Genome alignments and comparative genomics
The genome sequence of virulent B. pseudomallei K96243 was obtained from NCBI . Syntenic regions between the B. thailandensis (Bt) and B. pseudomallei (Bp) genomes were identified and aligned using MUMMER , and visualized using the ARGO Genome Browser  (Broad Institute, MIT). Protein homologs between the two species were identified using BLASTP at an E-value threshold of < 10-10 with subsequent confirmation by reciprocal blast. The BLASTP output was also manually curated to generate the final list of orthologous gene pairs between the two species. Pathway comparisons were performed by interrogating the Bp and Bt genomes against the KEGG metabolic database. Fisher's exact test was used to calculate the difference between proportions of functional enrichment. Rates of nucleotide substitution in CDs were calculated by aligning the nucleotide sequences of orthologous gene pairs by Clustalw  and determining the number of synonymous nucleotide substitutions per synonymous site (Ks) and the number of nonsynonymous nucleotide substitutions per nonsynonymous site (Ka) by a maximum-likelihood method  in PAML . Spearman's rank correlation analysis was performed using GraphPad Prism version 4.0 (GraphPad Inc., Calif.). Intergenic sequences of orthologous gene pairs, defined as the non-coding sequence between the translational start or stop of two successive genes, that were greater than 30 nucleotides between two species were compared by BLASTN. Two-sample t-tests (for normal distributions) and Mann-Whitney tests (for non-normal distributions) were used to determine the difference between means of different populations.
Experimental validation of genomic islands
The presence of absence of 15 GIs was assessed across 29 Bt isolates using PCR. The 29 isolates were collected from independent soil locations in two major areas: Northeast Thailand (21 isolates) and Central Thailand (8 isolates). One target gene and one flanking gene were selected for each genomic island based on Bt E264 sequence data. The target genes chosen were predominantly hypothetical conserved proteins; and genes such as integrases and bacteriophage were avoided. The flanking genes were the genes immediately abutting each putative island. The PCR primers and cycling conditions (supplementary table seven in Additional file 1) for the target and flanking genes were optimized using Bt E264 genomic DNA. PCR amplifications were performed with a PTC-0220 DNA engine (MJ Research, Cambridge, MA) with Platinum Taq polymerase (Invitrogen), and aliquots of reaction mixtures were analyzed by agarose gel electrophoresis.
SUPERFAMILY (release 1.69) is a library of HMM models based on SCOP and contains 1539 known structural superfamilies . The HMM based assignment tool provides structural assignments to protein sequences at the super-family level using known structural information. SUPERFAMILY was run to assign structural superfamilies and domains for Bt and Bp protein sequences in HMMER mode.
Identifying pseudogenes in Bp and Bt
To identify pseudogenes, we first used protein sequences of Bp (or Bt) genome to query the nucleotide sequence of the Bt (or Bp) genome using TBLASTN. We then applied the Psi-Phi program suite  on the BLAST outputs to recover candidate pseudogenes in each genome. Briefly, Psi-Phi is designed to identify pseudogenes by a comparative analysis of related genomes, and retrieves pseudogenes resulting from nonsense mutations, frameshifts generated by small insertions or deletions, large insertions (such as those resulting from transposable elements), and truncations of any specified length as well as any incorrectly annotated spacers resulting from gene degradation . Since Bp and Bt are closely related, a stringent cutoff of E-value < 10-15 and a minimal percentage of protein identity of 79% was used to identity pseudogenes. The candidate pseudogenes were manually curated, and the disrupting mutations determined by aligning the nucleotide sequences of putative pseudogenes with their functional counterparts using CLUSTALW 1.8 . The IS elements were searched by blasting genome sequences against the IS nucleotide database .
We thank Lee May Ann of the Defense Medical and Environmental Research Institute (DMERI) for providing strain Bt ATCC700388. We thank Sharon Peacock, Vanaporn Wuthiekanun, and Mongkol Vesaratchavest forthe gift of Bt genomic DNAs to assess the variability of the Bt-GIs. The work was supported by the Defense Science Organization (DSO), and a block grant from the Agency of Science, Technology and Research (A-star) to the Genome Institute of Singapore.
- Walsh C: Molecular mechanisms that confer antibacterial drug resistance. Nature. 2000, 406: 775-781. 10.1038/35021219.View ArticlePubMedGoogle Scholar
- Scarselli M, Giuliani MM, Adu-Bobie J, Pizza M, Rappuoli R: The impact of genomics on vaccine design. Trends Biotechnol. 2005, 23: 84-91. 10.1016/j.tibtech.2004.12.008.View ArticlePubMedGoogle Scholar
- Baar C, Eppinger M, Raddatz G, Simon J, Lanz C, Klimmek O, Nandakumar R, Gross R, Rosinus A, Keller H, Jagtap P, Linke B, Meyer F, Lederer H, Schuster SC: Complete genome sequence and analysis of Wolinella succinogenes. Proc Natl Acad Sci. 2003, 100: 11690-11695. 10.1073/pnas.1932838100.PubMed CentralView ArticlePubMedGoogle Scholar
- Dobrindt U, Hochhut B, Hentschel U, Hacker J: Genomic islands in pathogenic and environmental microorganisms. Nat Rev Microbiol. 2004, 2: 414-424. 10.1038/nrmicro884.View ArticlePubMedGoogle Scholar
- Chain PS, Carniel E, Larimer FW, Lamerdin J, Stoutland PO, Regala WM, Georgescu AM, Vergez LM, Land ML, Motin VL, Brubaker RR, Fowler J, Hinnebusch J, Marceau M, Medigue C, Simonet M, Chenal-Francisque V, Souza B, Dacheux D, Elliott JM, Derbise A, Hauser LJ, Garcia E: Insights into the evolution of Yersinia pestis through whole-genome comparison with Yersinia pseudotuberculosis. Proc Natl Acad Sci USA. 2004, 101: 13826-13831. 10.1073/pnas.0404012101.PubMed CentralView ArticlePubMedGoogle Scholar
- Vedros NA, Chow D, Liong E: Experimental vaccine against Pseudomonas pseudomallei infections in captive cetaceans. Dis Aquat Org. 1988, 5: 157-161.View ArticleGoogle Scholar
- Dance DAB: Melioidosis: the tip of the iceberg. Clin Microbiol Rev. 1991, 4: 52-60.PubMed CentralPubMedGoogle Scholar
- Yabuuchi E, Arakawa M: Burkholderia pseudomallei and melioidosis: be aware in temperate area. Microbiol Immunol. 1993, 37: 823-836.View ArticlePubMedGoogle Scholar
- Rotz LD, Khan AS, Lillibridge SR, Ostroff SM, Hughes JM: Public health assessment of potential biological terrorism agents. Emerg Infect Dis. 2002, 8: 225-230.PubMed CentralView ArticlePubMedGoogle Scholar
- Wuthiekanun V, Smith MD, Dance DA, Walsh AL, Pitt TL, White NJ: Biochemical characteristics of clinical and environmental isolates of Burkholderia pseudomallei. J Med Microbiol. 1996, 45: 408-412.View ArticlePubMedGoogle Scholar
- Brett PJ, Deshazer D, Woods DE: Characterization of Burkholderia pseudomallei and Burkholderia pseudomallei-like strains. Epidemiol Infect. 1997, 118: 137-148. 10.1017/S095026889600739X.PubMed CentralView ArticlePubMedGoogle Scholar
- Brett PJ, DeShazer D, Woods DE: Burkholderia thailandensis sp. nov., a Burkholderia pseudomallei-like species. Int J Syst Bacteriol. 1998, 48: 317-320.View ArticlePubMedGoogle Scholar
- Kim HS, Schell MA, Yu Y, Ulrich RL, Sarria SH, Nierman WC, DeShazer D: Bacterial genome adaptation to niches: Divergence of the potential virulence genes in three Burkholderia species of different survival strategies. BMC Genomics. 2005, 6: 174-10.1186/1471-2164-6-174.PubMed CentralView ArticlePubMedGoogle Scholar
- Cummings CA, Brinig MM, Lepp PW, van de Pas S, Relman DA: Bordetella species are distinguished by patterns of substantial gene loss and host adaptation. J Bacteriol. 2004, 186: 1484-1492. 10.1128/JB.186.5.1484-1492.2004.PubMed CentralView ArticlePubMedGoogle Scholar
- Holden MT, Titball RW, Peacock SJ, Cerdeno-Tarraga AM, Atkins T, Crossman LC, Pitt T, Churcher C, Mungall K, Bentley SD, Sebaihia M, Thomson NR, Bason N, Beacham IR, Brooks K, Brown KA, Brown NF, Challis GL, Cherevach I, Chillingworth T, Cronin A, Crossett B, Davis P, DeShazer D, Feltwell T, Fraser A, Hance Z, Hauser H, Holroyd S, Jagels K, Keith KE, Maddison M, Moule S, Price C, Quail MA, Rabbinowitsch E, Rutherford K, Sanders M, Simmonds M, Songsivilai S, Stevens K, Tumapa S, Vesaratchavest M, Whitehead S, Yeats C, Barrell BG, Oyston PC, Parkhill J: Genomic plasticity of the causative agent of melioidosis, Burkholderia pseudomallei. Proc Natl Acad Sci USA. 2004, 101: 14240-14245. 10.1073/pnas.0403302101.PubMed CentralView ArticlePubMedGoogle Scholar
- Coenye T, Vandamme P: Diversity and significance of Burkholderia species occupying diverse ecological niches. Environ Microbiol. 2003, 5: 719-729. 10.1046/j.1462-2920.2003.00471.x.View ArticlePubMedGoogle Scholar
- Nierman WC, DeShazer D, Kim HS, Tettelin H, Nelson KE, Feldblyum T, Ulrich RL, Ronning CM, Brinkac LM, Daugherty SC, Davidsen TD, Deboy RT, Dimitrov G, Dodson RJ, Durkin AS, Gwinn ML, Haft DH, Khouri H, Kolonay JF, Madupu R, Mohammoud Y, Nelson WC, Radune D, Romero CM, Sarria S, Selengut J, Shamblin C, Sullivan SA, White O, Yu Y, Zafar N, Zhou L, Fraser CM: Structural flexibility in the Burkholderia mallei genome. Proc Natl Acad Sci USA. 2004, 101: 14246-14251. 10.1073/pnas.0403306101.PubMed CentralView ArticlePubMedGoogle Scholar
- Ochman H, Wilson AC: Evolution in bacteria: evidence for a universal substitution rate in cellular genomes. J Mol Evol. 1987, 26: 74-86. 10.1007/BF02111283.View ArticlePubMedGoogle Scholar
- Achtman M, Zurth K, Morelli G, Torrea G, Guiyoule A, Carniel E: Yersinia pestis, the cause of plague, is a recently emerged clone of Yersinia pseudotuberculosis. Proc Natl Acad Sci. 1999, 96: 14043-14048. 10.1073/pnas.96.24.14043.PubMed CentralView ArticlePubMedGoogle Scholar
- Ochman H, Jones IB: Evolutionary dynamics of full genome content in Escherichia coli. EMBO J. 2000, 19: 6637-6643. 10.1093/emboj/19.24.6637.PubMed CentralView ArticlePubMedGoogle Scholar
- DeShazer D, Brett PJ, Woods DE: The type II O-antigenic polysaccharide moiety of Burkholderia pseudomallei lipopolysaccharide is required for serum resistance and virulence. Mol Microbiol. 1998, 30: 1081-1100. 10.1046/j.1365-2958.1998.01139.x.View ArticlePubMedGoogle Scholar
- Soldo B, Lazarevic V, Karamata D: tagO is involved in the synthesis of all anionic cell-wall polymers in Bacillus subtilis 168. Microbiology. 2002, 148: 2079-2087.View ArticlePubMedGoogle Scholar
- Woods DE, Jeddeloh JA, Fritz DL, DeShazer D: Burkholderia thailandensis E125 harbors a temperate bacteriophage specific for Burkholderia mallei. J Bacteriol. 2002, 184: 4003-4017. 10.1128/JB.184.14.4003-4017.2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Smith MD, Angus B, Wuthiekanun V, White NJ: Arabinose assimilation defines a non-virulent biotype of Burkholderia pseudomallei. Infect Immun. 1997, 65: 4319-4321.PubMed CentralPubMedGoogle Scholar
- Moore RA, Reckseidler-Zenteno S, Kim H, Nierman W, Yu Y, Tuanyok A, Warawa J, DeShazer D, Woods DE: Contribution of gene loss to the pathogenic evolution of Burkholderia pseudomallei and Burkholderia mallei. Infect Immun. 2004, 72: 4172-4187. 10.1128/IAI.72.7.4172-4187.2004.PubMed CentralView ArticlePubMedGoogle Scholar
- Makino K, Kim SK, Shinagawa H, Amemura M, Nakata A: Molecular analysis of the cryptic and functional phn operons for phosphonate use in Escherichia coli K-12. J Bacteriol. 1991, 173: 2665-2672.PubMed CentralPubMedGoogle Scholar
- Wanner BL, Metcalf WW: Molecular genetic studies of a 10.9-kb operon in Escherichia coli for phosphonate uptake and biodegradation. FEMS Microbiol Lett. 1992, 79: 133-139.View ArticleGoogle Scholar
- Kespichayawattana W, Intachote P, Utaisincharoen P, Sirisinha S: Virulent Burkholderia pseudomallei is more efficient than avirulent Burkholderia thailandensis in invasion of and adherence to cultured human epithelial cells. Microb Pathog. 2004, 36: 287-292. 10.1016/j.micpath.2004.01.001.View ArticlePubMedGoogle Scholar
- Reckseidler SL, DeShazer D, Sokol PA, Woods DE: Detection of bacterial virulence genes by subtractive hybridization: identification of capsular polysaccharide of Burkholderia pseudomallei as a major virulence determinant. Infect Immun. 2001, 69: 34-44. 10.1128/IAI.69.1.34-44.2001.PubMed CentralView ArticlePubMedGoogle Scholar
- Maki M, Jarvinen N, Rabina J, Roos C, Maaheimo H, Renkonen R, Pirkko , Mattila : Functional expression of Pseudomonas aeruginosa GDP-4-keto-6-deoxy-D-mannose reductase which synthesizes GDP-rhamnose. Eur J Biochem. 2002, 269: 593-601. 10.1046/j.0014-2956.2001.02688.x.View ArticlePubMedGoogle Scholar
- Latifi A, Winson MK, Foglino M, Bycroft BW, Stewart GS, Lazdunski A, Williams P: Multiple homologues of LuxR and LuxI control expression of virulence determinants and secondary metabolites through quorum sensing in Pseudomonas aeruginosa PAO1. Mol Microbiol. 1995, 17: 333-343.View ArticlePubMedGoogle Scholar
- Rainbow L, Hart CA, Winstanley C: Distribution of type III secretion gene clusters in Burkholderia pseudomallei, B. thailandensis and B. mallei. J Med Microbiol. 2002, 51: 374-384.View ArticlePubMedGoogle Scholar
- Gough J, Karplus K, Hughey R, Chothia C: Assignment of Homology to Genome Sequences using a Library of Hidden Markov Models that Represent all Proteins of Known Structure. J Mol Biol. 2001, 313: 903-919. 10.1006/jmbi.2001.5080.View ArticlePubMedGoogle Scholar
- Marlovits TC, Kubori T, Sukhan A, Thomas DR, Galan JE, Unger VM: Structural insights into the assembly of the type III secretion needle complex. Science. 2004, 306: 1040-1042. 10.1126/science.1102610.PubMed CentralView ArticlePubMedGoogle Scholar
- Balakirev ES, Ayala FJ: Pseudogenes: are they "junk" or functional DNA?. Annu Rev Genet. 2003, 37: 123-151. 10.1146/annurev.genet.37.040103.103949.View ArticlePubMedGoogle Scholar
- Lerat E, Ochman H: Psi-Phi: exploring the outer limits of bacterial pseudogenes. Genome Res. 2004, 14: 2273-2278. 10.1101/gr.2925604.PubMed CentralView ArticlePubMedGoogle Scholar
- Hayashi T, Makino K, Ohnishi M, Kurokawa K, Ishii K, Yokoyama K, Han CG, Ohtsubo E, Nakayama K, Murata T, Tanaka M, Tobe T, Iida T, Takami H, Honda T, Sasakawa C, Ogasawara N, Yasunaga T, Kuhara S, Shiba T, Hattori M, Shinagawa H: Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res. 2001, 8: 11-22. 10.1093/dnares/8.1.11.View ArticlePubMedGoogle Scholar
- Glaser P, Frangeul L, Buchrieser C, Rusniok C, Amend A, Baquero F, Berche P, Bloecker H, Brandt P, Chakraborty T, Charbit A, Chetouani F, Couve E, de Daruvar A, Dehoux P, Domann E, Dominguez-Bernal G, Duchaud E, Durant L, Dussurget O, Entian KD, Fsihi H, Garcia-del Portillo F, Garrido P, Gautier L, Goebel W, Gomez-Lopez N, Hain T, Hauf J, Jackson D, Jones LM, Kaerst U, Kreft J, Kuhn M, Kunst F, Kurapkat G, Madueno E, Maitournam A, Vicente JM, Ng E, Nedjari H, Nordsiek G, Novella S, de Pablos B, Perez-Diaz JC, Purcell R, Remmel B, Rose M, Schlueter T, Simoes N, Tierrez A, Vazquez-Boland JA, Voss H, Wehland J, Cossart P: Comparative genomics of Listeria species. Science. 2001, 294: 849-852. 10.1126/science.1063447.PubMedGoogle Scholar
- Rasko DA, Ravel J, Okstad OA, Helgason E, Cer RZ, Jiang L, Shores KA, Fouts DE, Tourasse NJ, Angiuoli SV, Kolonay J, Nelson WC, Kolsto AB, Fraser CM, Read TD: The genome sequence of Bacillus cereus ATCC 10987 reveals metabolic adaptations and a large plasmid related to Bacillus anthracis pXO1. Nucleic Acids Res. 2004, 32: 977-988. 10.1093/nar/gkh258.PubMed CentralView ArticlePubMedGoogle Scholar
- Welch RA, Burland V, Plunkett G, Redford P, Roesch P, Rasko D, Buckles EL, Liou SR, Boutin A, Hackett J, Stroud D, Mayhew GF, Rose DJ, Zhou S, Schwartz DC, Perna NT, Mobley HL, Donnenberg MS, Blattner FR: Extensive mosaic structure revealed by the complete genome sequence of uropathogenic Escherichia coli. Proc Natl Acad Sci USA. 2002, 99: 17020-17024. 10.1073/pnas.252529799.PubMed CentralView ArticlePubMedGoogle Scholar
- Parkhill J, Sebaihia M, Preston A, Murphy LD, Thomson N, Harris DE, Holden MT, Churcher CM, Bentley SD, Mungall KL, Cerdeno-Tarraga AM, Temple L, James K, Harris B, Quail MA, Achtman M, Atkin R, Baker S, Basham D, Bason N, Cherevach I, Chillingworth T, Collins M, Cronin A, Davis P, Doggett J, Feltwell T, Goble A, Hamlin N, Hauser H, Holroyd S, Jagels K, Leather S, Moule S, Norberczak H, O'Neil S, Ormond D, Price C, Rabbinowitsch E, Rutter S, Sanders M, Saunders D, Seeger K, Sharp S, Simmonds M, Skelton J, Squares R, Squares S, Stevens K, Unwin L, Whitehead S, Barrell BG, Maskell DJ: Comparative analysis of the genome sequences of Bordetella pertussis, Bordetella parapertussis and Bordetella bronchiseptica. Nat Genet. 2003, 35: 32-40. 10.1038/ng1227.View ArticlePubMedGoogle Scholar
- Warawa J, Woods DE: Type III secretion system cluster 3 is required for maximal virulence of Burkholderia pseudomallei in a hamster infection model. FEMS Microbiol Lett. 2005, 242: 101-108. 10.1016/j.femsle.2004.10.045.View ArticlePubMedGoogle Scholar
- Ahmed K, Enciso HD, Masaki H, Tao M, Omori A, Traravichikul P, Nagatake T: Attachment of Burkholderia pseudomallei to pharyngeal epithelial cells: a highly pathogenic bacteria with low attachment ability. Am J Trop Med Hyg. 1999, 60: 90-93.PubMedGoogle Scholar
- Brown NF, Boddey JA, Flegg CP, Beacham IR: Adherence of Burkholderia pseudomallei cells to cultured human epithelial cell lines is regulated by growth temperature. Infect Immun. 2002, 70: 974-980. 10.1128/IAI.70.2.974-980.2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Reckseidler-Zenteno SL, DeVinney R, Woods DE: The capsular polysaccharide of Burkholderia pseudomallei contributes to survival in serum by reducing complement factor C3b deposition. Infect Immun. 2005, 73: 1106-1115. 10.1128/IAI.73.2.1106-1115.2005.PubMed CentralView ArticlePubMedGoogle Scholar
- O'Quinn AL, Wiegand EM, Jeddeloh JA: Burkholderia pseudomallei kills the nematode Caenorhabditis elegans using an endotoxin-mediated paralysis. Cell Microbiol. 2001, 3: 381-393. 10.1046/j.1462-5822.2001.00118.x.View ArticlePubMedGoogle Scholar
- Kumar S, Tamura K, Nei M: MEGA3: Integrated Software for Molecular Evolutionary Genetics Analysis and Sequence Alignment. Brief Bioinform. 2004, 5: 150-163. 10.1093/bib/5.2.150.View ArticlePubMedGoogle Scholar
- Delcher AL, Harmon D, Kasif S, White O, Salzberg SL: Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 1999, 27: 4636-4641. 10.1093/nar/27.23.4636.PubMed CentralView ArticlePubMedGoogle Scholar
- Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M, Antonescu C, Salzberg SL: Versatile and open software for comparing large genomes. Genome Biol. 2004, 5: R12-10.1186/gb-2004-5-2-r12.PubMed CentralView ArticlePubMedGoogle Scholar
- ARGO Genome Browser. http://www.broad.mit.edu/annotation/argo/
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Z, Nielsen R: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000, 17: 32-43.View ArticlePubMedGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13: 555-556.PubMedGoogle Scholar
- Siguier P, Perochon J, Lestrade L, Mahillon J, Chandler M: ISfinder: the reference centre for bacterial insertion sequences. Nucleic Acids Res. 2006, D32-6. 10.1093/nar/gkj014. 34 DatabaseGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.