Tandem repeat regions within the Burkholderia pseudomallei genome and their application for high resolution genotyping

Background The facultative, intracellular bacterium Burkholderia pseudomallei is the causative agent of melioidosis, a serious infectious disease of humans and animals. We identified and categorized tandem repeat arrays and their distribution throughout the genome of B. pseudomallei strain K96243 in order to develop a genetic typing method for B. pseudomallei. We then screened 104 of the potentially polymorphic loci across a diverse panel of 31 isolates including B. pseudomallei, B. mallei and B. thailandensis in order to identify loci with varying degrees of polymorphism. A subset of these tandem repeat arrays were subsequently developed into a multiple-locus VNTR analysis to examine 66 B. pseudomallei and 21 B. mallei isolates from around the world, as well as 95 lineages from a serial transfer experiment encompassing ~18,000 generations. Results B. pseudomallei contains a preponderance of tandem repeat loci throughout its genome, many of which are duplicated elsewhere in the genome. The majority of these loci are composed of repeat motif lengths of 6 to 9 bp with 4 to 10 repeat units and are predominately located in intergenic regions of the genome. Across geographically diverse B. pseudomallei and B.mallei isolates, the 32 VNTR loci displayed between 7 and 28 alleles, with Nei's diversity values ranging from 0.47 and 0.94. Mutation rates for these loci are comparable (>10-5 per locus per generation) to that of the most diverse tandemly repeated regions found in other less diverse bacteria. Conclusion The frequency, location and duplicate nature of tandemly repeated regions within the B. pseudomallei genome indicate that these tandem repeat regions may play a role in generating and maintaining adaptive genomic variation. Multiple-locus VNTR analysis revealed extensive diversity within the global isolate set containing B. pseudomallei and B. mallei, and it detected genotypic differences within clonal lineages of both species that were identical using previous typing methods. Given the health threat to humans and livestock and the potential for B. pseudomallei to be released intentionally, MLVA could prove to be an important tool for fine-scale epidemiological or forensic tracking of this increasingly important environmental pathogen.


Conclusion:
The frequency, location and duplicate nature of tandemly repeated regions within the B. pseudomallei genome indicate that these tandem repeat regions may play a role in generating and maintaining adaptive genomic variation. Multiple-locus VNTR analysis revealed extensive diversity within the global isolate set containing B. pseudomallei and B. mallei, and it detected genotypic differences within clonal lineages of both species that were identical using previous typing methods. Given the health threat to humans and livestock and the potential for B. pseudomallei to be released intentionally, MLVA could prove to be an important tool for fine-scale epidemiological or forensic tracking of this increasingly important environmental pathogen.

Background
The environmental saprophyte Burkholderia pseudomallei is the causative agent of melioidosis, a disease endemic to tropical regions of Southeast Asia and northern Australia. Symptoms range in severity from fatal sepsis and acute community-acquired pneumonia to benign and localized abscesses. Infection in humans and animals generally occurs through direct contact of open wounds or abrasions with contaminated water and soil, by ingestion of contaminated drinking water, or inhalation of infectious aerosols. Melioidosis is a serious public health threat in Thailand and northern Australia, where it is associated with a case fatality rate of approximately 50 and 20%, respectively [1]. In addition, B. pseudomallei has recently attracted attention as a potential biological weapon, and is listed as a Category B biothreat agent by the U.S. Centers for Disease Control and Prevention (CDC) [2].
The close genetic relationship of B. pseudomallei to B. mallei has previously been demonstrated by DNA hybridization studies [3]. More recently, studies have revealed that B. mallei is a clonal lineage of B. pseudomallei, and its recent evolutionary divergence is marked by gene deletions and intra-chromosomal rearrangements [4][5][6][7]. B. mallei, the etiologic agent of glanders, is an obligate parasite of the family Equidae, but can also infect humans through direct contact with infected animals [8] or occupational exposure [9]. Glanders was once a globally distributed disease, but is currently predominant only in the Middle East, Africa, Asia and Central and South America. Due to its highly infectious nature and ability to infect via aerosol, it was used as a biological weapon during World War I and World War II [10,11]. It is also listed as a Category B biothreat agent by the CDC [2].
Due to the severe nature of melioidosis, the molecular epidemiology of B. pseudomallei has been investigated using various DNA restriction-based methods, including Pulse Field Gel Electorphoresis (PFGE) [12,13] and ribotyping [14,15]. PFGE has the ability to resolve potentially polymorphic, large DNA restriction fragments, while ribotyping uses restriction fragment length polymorphisms associated with rRNA genes [16]. Although both of these methods have been successful in the epidemiological tracking of pathogens [17], their technical nature can make large datasets more difficult to handle. Also, neither method is easily standardized for transfer throughout the scientific and public health community, and can often lack discriminatory power among closely related isolates within a species or between closely related species [18].
Other procedures that have been used for molecular typing of B. pseudomallei involve PCR, such as random amplified polymorphic DNA (RAPD) [19,20] and multilocus sequence typing (MLST) [6]. RAPD detects differences in genomes by amplifying segments of unknown DNA. Drawbacks to this technique include the presence/ absence binary nature of the data and the difficulty in reproducing banding patterns between reactions (attributed to PCR artifacts). MLST uses concatenated nucleotide sequences from seven housekeeping genes, that are assumed to be selectively neutral or under purifying selection [21]. This method provides nucleotide data for multiple haplotypes, is easily amenable to phylogenetic analyses and can be standardized across laboratories. The MLST scheme developed for B. pseudomallei is also applicable to B. mallei and B. thailandensis. However, MLST can be time consuming and expensive, and most importantly lacks discriminatory power within closely related B. pseudomallei isolates and among the vast majority of B. mallei isolates, which are all close genetic relatives [6].
Recently, a reliable PCR-based method using variablenumber tandem repeat (VNTR) loci has become a popular tool for the molecular typing of pathogens [18,[22][23][24][25]. A VNTR locus consists of tandemly repeated sequences of DNA that vary in copy number, creating PCR amplicon size polymorphisms that are easily detected with gel electrophoresis. Due to increased mutation rates when compared to other regions of DNA and their multi-allelic nature, VNTRs allow superior discrimination between closely related isolates. These loci have been successfully implemented for forensic, epidemiological and phylogenetic analyses of bacterial pathogens with low genetic diversity, such as Bacillus anthracis, F. tularensis, and Y. pestis [23,[26][27][28][29][30]].
Due to the success of VNTR typing in other pathogens, the primary objective of this study was to develop a high-res-olution VNTR typing system for B. pseudomallei that is suitable for epidemiological, forensic, phylogenetic and population genetic studies. Thus the first task for this study was to characterize tandem repeat loci, including their distribution and frequency within the B. pseudomallei genome. Additionally, in order to develop a comprehensive multiple-locus VNTR typing system that utilizes loci with varying degrees of polymorphism, the second task was to screen loci that were characteristic of the tandem repeat loci throughout the genome and examine levels of polymorphism. Finally, in order to understand the effects that mechanisms such as recombination and mutation have on generating the high levels of diversity observed in this pathogen, it was essential to examine the mutation rates for the non-duplicated VNTR loci chosen for the typing system, as well as a representative sample of the duplicated tandem repeat regions. Furthermore, the estimation of mutation rates will allow for future epidemiological studies that model the transmission of melioidosis in natural populations, similar to published studies on plague [26].
In this manuscript we describe a multiple-locus VNTR analysis (MLVA) genotyping system in which 32 independent, tandemly inserted repeated motifs identified in the B. pseudomallei K96243 genome are amplified using fluorescently labeled primers in multiplexed PCRs and separated using capillary electrophoresis. These loci were highly polymorphic across a globally distributed set of 66 B. pseudomallei and 21 B. mallei isolates, as well as a few very closely related B. pseudomallei isolates from an outbreak event and two individual patients.

Tandem repeats within the Burkholderia pseudomallei genome
We observed that in comparison to other bacterial pathogens with similarly sized genomes, such as Bacillus anthracis Ames and Yersinia pestis CO92, the Burkholderia pseudomallei K96243 genome harbors a relatively large number of tandem repeat arrays ( Figure 1). The large (4,074,542 bp) chromosome of B. pseudomallei contains 285 (69.9 arrays/Mbp) while the small (3,173,005 bp) chromosome contains 324 (102.1 arrays/Mbp) tandem repeat arrays (Table 1). In contrast, the Y. pestis genome contains only 174 arrays and B. anthracis contains just 66 arrays, at densities of 37.4 arrays/Mb interval and 12.6 arrays/Mb, respectively. In B. pseudomallei, tandem repeat motif sizes on both chromosomes ranged from 3 to 16 bp with copy numbers ranging from 4 to 21 units (Figure 2, A1 and A2). Non-triplet repeat motifs were more common in intragenic regions than inside genes (Figure 2, B1 and B2).

Distribution and location of tandem repeats
A χ 2 goodness-of-fit test of the "observed" B. pseudomallei tandem repeat distribution to an "expected" Poisson distribution was significant for both the large (p < 0.001) and small chromosomes (p < 0.001) using 10 Kb intervals ( Figure 3). The non-random observed distributions for both chromosomes are consistent with a clustered arrangement of arrays throughout both chromosomes. Additionally, the majority of the tandem repeats were found in intergenic regions of the chromosomes: 74.7% (n = 213) tandem repeats on the large chromosome and 68.2% (n = 221) on the small chromosome. However, a portion of these arrays (28.1% on the large chromosome and 35.2% on the small chromosome) were found inside or within 40 base pairs upstream of predicted ORFs ( Table  1). Longer arrays (≥ 11 repeat units), including even those with triplet motifs, tended not to be found inside predicted protein coding regions on the large chromosome ( Figure 2A1). Conversely, on the small chromosome, longer arrays with triplet repeat motifs were found in both inter-and intragenic locations in almost equal numbers ( Figure 2A2). It was also observed that four-fold more degenerate arrays were found on the small chromosome than on the large, and the majority of these degenerate arrays were located inside coding regions ( Figure 2A1, and 2A2).
We found that 36.3% of the total number of tandem repeat arrays on both chromosomes of B. pseudomallei are duplicated, at least partially (≥ 20 bp and ≥ 80% similarity), in other locations on either chromosome ( Table 1). Most of these duplications were found in intergenic regions of the chromosomes and involved the repeat motif only and not the flanking sequences. The majority of duplicated tandem repeats on the large chromosome were, in fact, duplicated on the small chromosome, rather than on the large chromosome. In contrast, arrays duplicated on the small chromosome were found in equal numbers on both chromosomes (Table 1). Additionally, total array lengths were typically longer for duplicated tandem arrays. For example, 104 of the 108 duplicated arrays on the large chromosome, and 112 of the 114 duplicated arrays on the small chromosomes are larger than 200 bp, with the largest almost 6000 bp in size. It was observed that repeat regions that contained more than 20 repeat copies were found to be duplicated in some fashion, and repeat motifs of six and seven bp were more often duplicated than not ( Figure 2).

MLVA development
In order to develop a MLVA system for B. pseudomallei, a variety of array sizes were screened, from 2 bp repeat motif by 7 repeat copy unit (i.e. 2 × 7) to degenerate repeat arrays greater than 500 bp but less than 1000 bp, for a total of 104 VNTR loci. We also screened both intra-and intergenically located arrays. Criteria used for including loci in the MLVA system were 1) variation within the screening panel (see Methods), either within the globally distributed or locally distributed outbreak sets, 2) robust (> 80% success) PCR amplification, and 3) highly discrete PCR amplicon sizes (minimal partial repeat differences), based upon locus repeat unit motif. Thirty-two loci met the above three criteria and were chosen for MLVA development (Tables 2 and 3).

B. pseudomallei and B. mallei genetic relationships
The 32-locus MLVA system was used to characterize 66 B. pseudomallei and 21 B. mallei isolates from diverse geographic locations (Table 4). These loci provide high levels of discrimination among different isolates of B. pseudomallei, with the number of alleles ranging between 7 to 28, and Nei's diversity values between 0.47 and 0.94 across all B. pseudomallei and B. mallei isolates (Table 3). Furthermore, the MLVA loci amplified equally well in both B. pseudomallei and closely related B. mallei strains, and showed variation between and among the two closely related species. MLVA loci did not PCR amplify in the more genetically distant B. thailandensis and B. cepacia.
Analysis of allelic variation at 23 loci using a Neighbor Joining distance algorithm revealed 62 genotypes among the 66 B. pseudomallei isolates and 19 genotypes among the 21 B. mallei isolates. Phylogenetic analysis of these VNTR data provided an extremely high level of strain discrimination even within B. pseudomallei isolates from single melioidosis patients (Patient 465 and chronic lung patient) and within isolates from a single B. pseudomallei outbreak focus in Australia (Goat Farms 1 and 2) ( Figure  4). The average pairwise genetic distance was 0.86 for B. pseudomallei, and 0.61 for B. mallei.
A phylogram depicting this analysis indicates four highly diverse major clusters among the two Burkholderia sp., although there is less than 50% bootstrap support for these branches (Figure 4). These major clusters did not reveal any noticeable geographic or temporal relation-ships, with isolates from the same country or the same time period occurring in all groups. However, there are many instances in which the relationships between closely related isolates demonstrate clear geographic correlations with solid statistical support ( Figure 4). Additionally, the tree indicates that overall, B. pseudomallei is much more diverse than B. mallei, although this could be due to the less geographically diverse nature of the B. mallei isolates. The tree clearly shows that the B. mallei isolates form a monophyletic group derived from a B. pseudomallei ancestor. The split between B. mallei and B. pseudomallei is supported by two MLVA loci (3564 k and 2445 k) that contain multiple alleles specific to B. mallei.
A comparison of a subset of isolates to other typing methods revealed that MLVA is much more discriminating between closely related isolates. MLST data for 37 of the 66 B. pseudomallei and four of the 21 B. mallei isolates used in this study were obtained from the online database [31]. A comparison of MLST and MLVA for these 37 B. pseudomallei isolates revealed seven instances where MLST sequence types were identical between isolates, while MLVA genotypes were different in all but two of these instances ( Figure 4). Of particular note was the single MLST genotype for B. mallei and the multiple MLVA genotypes for the same isolates (n = 4). Additionally, a ribotyping study revealed three genotypes for seven of the B. mallei isolates (T2, T4, T5, T7, T9, GB5, GB6), while MLVA identified unique genotypes for every isolate [32].

Mutation rates of tandem repeats
Parallel serial passages experiments (PSPE) from a single B. pseudomallei isolate resulted in estimated ~18,000 generations of growth from which lineages were analyzed for variation in all MLVA loci. Mutational events were observed in 12 VNTR loci; the number and type of mutations observed are shown in Table 5. We observed comparable numbers of mutations for loci on each chromosome. There was a noticeable trend towards single repeat mutations (p = 0.0001) as well as bias towards insertion mutations (p = 0.0736) ( Table 5). No discerna- Linear repeat array distribution of B. anthracis, Y. pestis and B. pseudomallei chromosomes Figure 1 Linear repeat array distribution of B. anthracis, Y. pestis and B. pseudomallei chromosomes. Nucleic acid repeat region "icicle" plots were generated with DNAStar GeneQuest software (Madison, WI). The horizontal scale indicates the linear position in base pairs along the respective chromosomes from the start position of the GenBank FASTA file sequence. The scale bar to the right of each icicle plot indicates 10 possible repeat sequence combinations as found by the GeneQuest software. The overall length, or number of possible repeat combinations of each icicle, is a measure of the size of the repeated sequence array found at that position. In general, the longer the icicle, the larger the repeat array. Note that both perfect and degenerate repeat arrays are found and displayed by GeneQuest, as indicated by the arrows and notes in panel C. The number of arrays/Mbp and total arrays are all repeat regions found by the software package Tandem Repeats Finder larger than 30 bp and with an internal similarity greater than or equal to 80%.
ble pattern was observed between loci that had mutations and those without mutations with respect to array size, repeat motif GC %, and/or amplification characteristics. The number of successful lineage PCR amplifications for the mutating MLVA loci ranged from 75-95 (out of 95 possible), averaging 90.25 ± 5.7; while those from the non-mutating loci ranged from 82-95, averaging 92.25 ± 3.1 (data not shown). (The basis of these failures is under investigation, but all mutation rates were corrected appropriately for these missing data.) We observed an average of 1.67 mutations per locus, and mutation rates for individual loci ranged from 5.3 × 10 -5 to l.7 × 10 -4 . The combined mutation rate across all 32 loci was 1.113 × 10 -3 , which represents a discrimination power estimator for this MLVA typing system (Table 5). It is similar to the Y. pestis MLVA system rate and greater than the E. coli rate.
We also examined mutation rates for 17 tandem repeat loci, not included in the final MLVA system, containing arrays found to be duplicated in up to four different locations within and/or between chromosomes (Table 6). In contrast to the MLVA loci, all duplicated loci screened consisted of either six or seven bp repeat motifs, as these were most commonly found with larger duplicated regions in the K96243 strain. Also, while the number of mutations for the duplicated arrays was equal to the Repeat region motif size and total array size distribution Figure 2 Repeat region motif size and total array size distribution. A) Frequencies of arrays consisting of different size repeat motifs in inter-, intragenic and duplicated locations. Degenerate repeats were determined as described in the Materials and Methods Section. B) Frequencies of arrays consisting of different total size classes, again in inter-, intragenic and duplicated locations, based upon triplet and non-triplet repeat motif copy number. Degenerate arrays are not included as consensus repeat motifs were not determined.   MLVA loci, there were more mutations observed for large chromosome loci than small chromosome loci. Additionally, there was a nonsignificant trend towards multiple repeat mutations (p = 0.5127), as well as, a nominally significant trend towards deletion mutations (p = 0.0495) ( Table 6). The multiple repeat mutations ranged from 2 to 6 repeat units. Two of the duplicated loci (1558 k and 3851 k), had less than 50% PCR amplification. Highly unpredictable PCR amplification was seen with three loci (3166 k, 1343 k and 2646 k). These PCR failures could be due to the difficult nature of PCR in a high GC organism such as B. psuedomallei, or could be indicative of loss of priming sites due to recombination. The PCR amplification success rates for the remaining loci were comparable to the MLVA loci. The duplicated loci averaged 2.6 mutations/locus, and combined mutation rate for 15 duplicated tandem repeat loci was also comparable to the nonduplicated MLVA loci, at 1.23 × 10 -3 for ~18,000 generations.

Discussion
Burkholderia pseudomallei is a distinctive microbial pathogen due to its ability to survive and exploit a wide variety of environmental conditions, as well as, the opportunistic infection of animals. It can cause mild, chronic, or rapidly progressing and potentially fatal disease states in a range of animal hosts [33], and it has a demonstrated ability invade the cells of other eukaryotic organisms such as fungi and amoeba [34,35]. It has been known to survive extreme environmental conditions for long periods of time, including nutrient starvation [36], and chlorine concentrations generally recognized as sufficient for potable water treatment [37]. This level of environmental flexibility and pathogenic potential may require the B. pseudomallei genome to be highly plastic in order to quickly adapt to different environments. Indeed, while the large chromosome primarily harbors genes essential for growth, the small chromosome contains more diverse genes that are primarily involved in survival and/or exploiting variable   or contingent environmental conditions. Consequently, it is not biologically surprising that numerous genetic typing methodologies [6,14,15,22], including the MLVA system reported here, find very high levels of genetic diversity within this organism. The high level of genetic diversity and host flexibility of the organism suggest enhanced mechanisms for generating and maintaining adaptive variation through processes such as selection, recombination and mutation.
The unusually high number of tandem repeats in B. pseudomallei (compared to other pathogenic bacteria with similarly sized genomes such as B. anthracis and Y. pestis, and other bacteria of similar GC content [5]) is indicative of potentially high genomic diversity which, in turn, may facilitate rapid genomic adaptation to a variable environment. While the majority of large VNTRs in B. pseudomallei are located intergenically and thus may have no direct phenotypic effect, it has been observed in other bacteria that such loci, when upstream of genes, can alter important biological functions through mechanisms such as transcriptional regulation and amino acid changes [38][39][40][41]. Within coding regions we observed fewer tandem repeat arrays. The majority of these tandem arrays contain repeat units in multiples of three, which indicates the potential for adaptive variation. For example, Nierman et al. [5] observed variation in triplet repeat unit simple sequence repeat (SSR) loci that are located inside four genes coding for surface or putative virulence proteins in B. mallei and B. pseudomallei. A subsequent serial passage experiment of B. mallei through several mammalian hosts revealed indels in seven intragenic SSR loci, five of which caused frameshift mutations, while the other two were triplet repeats that only added or removed amino acids from the encoded protein [42]. This variation is consistent with the potential for phase variation during the infection cycle and may be a mechanism to avoid host defenses [5,42]. Thus, given the similarity of B. mallei and B. pseudomallei, the unusually high number of tandem repeat loci in B. pseudomallei, as well as their non-random arrangement, as indicated by a deviation from the expected Poisson distribution (Figure 3), may indicate that coding and non-coding genomic regions use different molecular mechanisms to adapt to different selective pressures.
In addition to the large number of tandem repeats in B. pseudomallei, there was a prevalence of duplicated tandem repeats throughout the genome. In B. pseudomallei, 37.9% of tandem repeats in the large chromosome and 35.2% of tandem repeats in the small chromosome are found to be duplicated, at least in part, at other intra-and inter-chromosomal locations. Moreover, a serial passage experiment revealed that the duplicated loci show a contrasting trend towards deletions, as well as an increased frequency of multiple repeat changes in comparably sized repeat arrays, while displaying comparable mutation rates to non-duplicated loci; which is in contrast to the lack of bias in Y. pestis [43]. This suggests that the repeat regions within B. pseudomallei may facilitate large scale genomic rearrangements through recombination rather than slipstrand mispairing [44]. Although this has not been specifically studied in B. pseudomallei, it has been suggested that SSRs in Mycoplasma genomes may in fact facilitate genomic rearrangements via recombination [45], and that long tracts of tandem repeats may facilitate gene transfer [46]. Conversely, tandem repeats may not directly cause recombination, but rather be associated with regions that are prone to recombination for other reasons. Since recombination frequency is affected by the length of the homology between two loci [47] which in turn is control- led by slip strand repair, the observed tandem repeat patterns could represent an interesting interaction between slip strand expansion and recombination.
During in vitro passage, mutation events were observed in multiple B. pseudomallei VNTR loci suggesting similar mutation rates at many loci. The MLVA combined mutation rate reported in this study is 1.113 × 10 -3 mutations/ generation, compared to combined MLVA rates in E. coli and Y. pestis rates of 6.4 × 10 -4 and 1.1 × 10 -3 mutations/ generation (respectively) [26,43,48]. The combined rate is, hence, comparable to those previously observed in E. coli and Y. pestis and offers similar subtyping discriminatory power. These rate calculations are dependent upon accurate estimation of the population growth parameters during serial passage and this may be particularly problematic for B. pseudomallei, which forms highly mucoid colonies. Experimental serial passage studies in E. coli and Y. pestis have previously identified a positive correlation between the in vitro mutation rate and natural locus diversity. This correlation was not detected in B. pseudomallei (analysis not shown) and it is not immediately obvious what differs between these pathogens. Perhaps due the much larger number of VNTR loci in B. pseudomallei, the current study was based upon an overwhelming number of equally and highly mutable loci, which are not commonly present in other genomes. In other words, the marker loci in E. coli and Y. pestis MLVA systems are stratified by their mutability but in the Burkholderia MLVA we may examining a number loci that are equally mutable. Thus, there is no correlation with array size. Another interesting difference is in the mutation products, where the majority (19:1) were single repeat changes. This bias was greater than observed in the E. coli and Y. pestis studies where the single-repeat mutational products were about 80% of the total observed. The lack of more two and three repeat changes needs to be explored in a larger in vitro population to see if this trend repeats reality in this particular genome.
Here we present a rapid PCR-based MLVA typing system using 32 independent VNTR loci. Although the initial development of a MLVA system in this organism was complicated by the quantity and duplicated nature of repeated regions found in B. pseudomallei and inconsistencies of the allelic size variation in comparison to the repeat unit size, we found 23 markers that were useful for phylogenetic analysis due to high diversity levels, minimal partial repeat differences and amplification success. An additional nine loci, while demonstrating some partial repeat sizes, are very useful for even finer scale resolution of closely related B. pseudomallei and B. mallei isolates from outbreak situations [49]. While no specific effort was made to design the MLVA primers specific to B. mallei, all B. mallei isolates tested amplified well at every locus, as expected given the phylogenetic relationship of the two species [6]. Conversely, B. thailandensis and B. cepacia did not amplify well in any of the loci, indicating that the MLVA loci primers will not support amplification in more distantly related bacterial species, although this has not been explicitly tested. Thus, this MLVA system represents a reliable method of identifying B. pseudomallei as well as B. mallei strains. Furthermore, this typing method is an easily transferable approach to high-resolution molecular typing analysis using low levels of crudely isolated DNA. The unique size and fluorescent label of each allele, as well as automated sizing software, allows for easy classification of each VNTR allele, and capillary electrophoresis significantly reduces run time.
Due to the relative effects of convergent evolution, reversal mutations, recombination, gene duplications and suggested horizontal gene transfer within Burkholderia pseudomallei, phylogenetic hypotheses have been difficult to establish. For example, neither MLST [6] nor MLVA are able to resolve the deeper relationships among distantly related B. pseudomallei isolates, as illustrated by the poor bootstrap support for deeper branches ( Figure 4) and similar levels of consistency for a subset of the same isolates (~0.63) (data not shown). This lack of resolution results in the absence of a geographic correlation within basal clades, although more derived clades do demonstrate geographic associations between isolates ( Figure 4). In comparison, an analysis of Thai and Australian isolates using MLST exhibited no overlap between sequence types for the two countries [50]. However, phylogenetic analysis of these data lacks strong bootstrap values to support this geographic differentiation. Also, the analysis of historical isolates of B. pseudomallei using MLST reveals an overlapping sequence type between Australia and Thailand environmental isolates, and does not support the genetic distinction of isolates from Australia [51]. Thus, phylogenetic hypotheses using both MLVA and MLST data are difficult to establish with isolates that are geographically and temporally distant. The present typing system targets VNTR loci over a wide range of diversity levels and consequently provides resolution between B. pseudomallei and B. mallei, while still providing high levels of discrimination between closely related isolates due to the high variability of tandem repeat loci in these bacterial pathogens. Whereas a number of typing methodologies such as PFGE, ribotyping, RAPDs and MLST have detected differences between isolates, their resolving power among very closely related isolates is less than MLVA [6,14,15,19]. For example, while MLST analysis provided only a single unique genotype for the B. mallei cluster, MLVA further resolved the B. mallei group into individual genotypes, even among very closely related isolates from Turkey with the same ribotype [32]. Additionally, B. pseudomallei isolates with the same sequence type often had different MLVA genotypes ( Figure 4). This type of high resolution genotyping can define patterns of mutation within very closely related isolates from an outbreak, which can then be used for generating phylogenetic hypotheses [49].
A recent study by Liu et al. (2006) used six VNTR loci to differentiate B. pseudomallei isolates from an outbreak in Singapore [22]. Four of the six loci used were characterized in the present MLVA study. Two of these loci are included in this MLVA (Table 2), but the other two loci were found to be duplicated within the genome, and consequently were not included in MLVA development. This six-locus MLVA offered insight into the epidemiology of B. pseudomallei in Singapore, but presented limitations due to the lack of resolution inherent in agarose gel electrophoresis. Given the partial repeat sizes (as small as 3 bp) seen with capillary electrophoresis, it is doubtful that all alleles for these loci were detectable using agarose gels, and thus levels of diversity were underestimated. Additionally, because two of the VNTR loci that were used are duplicated within the genome, they are not recommended for phylogenetic analysis due to the confounding phylogenetic effects of gene duplication and associated possibilities for independent evolutionary trajectories.

Conclusion
In summary, the findings of this study suggest that the prevalence and location of tandemly repeated regions within the B. pseudomallei genome may generate and maintain adaptive variation in this bacterial pathogen. The intragenically located repeat regions, found twice as frequently on the "contingency-oriented" small chromosome [4], may provide for rapid changes in gene function. Duplicated repeat regions may facilitate genomic rearrangements which can lead to altered gene regulation. While the mutation rates of individual repeat regions do not appear to be enhanced over those in other organisms, the sheer number of these regions, some of which are quite large, provides great potential for genetic variation within this species.
Epidemiological characterization is important in any pathogen, but most especially for those that are emerging as global pathogens that may be exploited for biological terrorism, such as B. pseudomallei. While no typing system for B. pseudomallei can currently be used to reliably establish deep phylogeneic relationships, the B. pseudomallei-B. mallei multiplex MLVA typing system presented here pro-vides unsurpassed ability to resolve very closely related isolates, even those from the same patient. Efficient and sensitive genetic typing tools, such as the MLVA system presented here, are important for facilitating the increasingly important epidemiological and phylogenetic characterization of emerging pathogens.  (Table 4).

VNTR identification
The complete genome sequence of Burkholderia pseudomallei strain K96243 was obtained from the National Center for Biotechnology Information [GenBank: NC_006350, NC_006351] and screened for potentially polymorphic repetitive sequences that were comprised of ≥ dinucleotide repeats, 4 copies and a total array size of 30 bp using GeneQuest (Lasergene, Inc., Madison, WI) and Tandem Repeats Finder [53]. Primers flanking repeat sequences were designed using Primer Express (Lasergene, Inc., Madison, Wis.).
To assess the variability of repeated regions among a globally distributed set of isolates and to develop a comprehensive typing system for this organism, 104 repeated regions (48 from the large chromosome, 56 from the small) were targeted for analysis and subsequent incorporation into a multiple-locus VNTR analysis (MLVA) system. These VNTR loci were selected based upon PCR amplicon size, array size, locus duplication, and proximity to other arrays. Loci resulting in small PCR fragment sizes (<1000 bp) were favored since such loci amplified better than larger regions, and are best suited for analytical platforms. Arrays with fewer than five copies of a motif were not selected for screening. Loci that were duplicated, either within or between chromosomes were also eliminated since multiple alleles would confuse a typing system. Lastly, repeat regions in close proximity (<1000 bp) to other repeat regions were avoided to preserve locus independence. Loci were not excluded based on their intra or intergenic location. The 104 candidate loci were examined for robust amplification and polymorphism across a screening panel which was comprised of 29 B. pseudomallei isolates, one B. mallei isolate (ATCC 10399), and one B. thailandensis isolate (ATCC 700388). B. pseudomallei stains in the screening panel included 15 closely related isolates from two different outbreaks in northern Australia [49], and 14 geographically diverse isolates from seven different countries (Table 4). This tiered screening panel allowed us to identify loci with varying degrees of polymorphism.

VNTR screening using universal tail PCR and genotype analysis
A high throughput five dye Universal Tail amplification and labeling methodology, developed for use in the low GC (x = 35%) bacterium B. anthracis [54], was used to screen the chosen repeat region loci for variation among a combination of 29 diverse and closely related B. pseudomallei isolates. The optimal Tm for labeling sequences in B. anthracis is 55°C, however due to the high G-C (x = 68.12%) content of the B. pseudomallei genome, all PCR reactions were performed at a Tm of 72°C.
The UT PCR labeling protocol provides for fluorescent labeling of any PCR amplicon with only four universal fluorescently labeled oligonucleotiodes. The fluorescently labeled universal primer is complimentary to a universal tailed primer sequence on the 5' end of the target specific forward primer (FAM = ACCCAACTGAATAGAGAGC, NED = ATCGACTGTGTTAGGTCAC, PET = CTGTCCT-TACCTCAATCTC and VIC = ACGCACTTGACTTGTCTTC). This method significantly reduces the cost of initial screening by not having to order labeled primers for each locus. PCR amplicons were diluted with double-distilled H 2 0 based upon their universal tail sequence (FAM and NED CLCF, RRLD, SDR, BL, SG, MC and LYH generated the molecular genetic data. DD, SPH, RR, DG and MM carried out the genomic DNA preparation. BJC provided epidemiological and clinical data. DW and PK obtained the funding, conceived the study, helped participate in its design and helped to draft and edit the manuscript. All authors read and approved the final manuscript.