Tandem repeats analysis for the high resolution phylogenetic analysis of Yersinia pestis

Background Yersinia pestis, the agent of plague, is a young and highly monomorphic species. Three biovars, each one thought to be associated with the last three Y. pestis pandemics, have been defined based on biochemical assays. More recently, DNA based assays, including DNA sequencing, IS typing, DNA arrays, have significantly improved current knowledge on the origin and phylogenetic evolution of Y. pestis. However, these methods suffer either from a lack of resolution or from the difficulty to compare data. Variable number of tandem repeats (VNTRs) provides valuable polymorphic markers for genotyping and performing phylogenetic analyses in a growing number of pathogens and have given promising results for Y. pestis as well. Results In this study we have genotyped 180 Y. pestis isolates by multiple locus VNTR analysis (MLVA) using 25 markers. Sixty-one different genotypes were observed. The three biovars were distributed into three main branches, with some exceptions. In particular, the Medievalis phenotype is clearly heterogeneous, resulting from different mutation events in the napA gene. Antiqua strains from Asia appear to hold a central position compared to Antiqua strains from Africa. A subset of 7 markers is proposed for the quick comparison of a new strain with the collection typed here. This can be easily achieved using a Web-based facility, specifically set-up for running such identifications. Conclusion Tandem-repeat typing may prove to be a powerful complement to the existing phylogenetic tools for Y. pestis. Typing can be achieved quickly at a low cost in terms of consumables, technical expertise and equipment. The resulting data can be easily compared between different laboratories. The number and selection of markers will eventually depend upon the type and aim of investigations.


Background
Within the Y. pestis species, strains are separated into three biovars according to their ability to reduce nitrate and to ferment glycerol [1]. Since Y. pestis was connected to plague by Yersin [2], strains of biovar Antiqua have been generally isolated from Asia and from East and Central Africa, Medievalis was found in Central Asia, and Orientalis worldwide. Y. pestis is thought to have recently evolved from Yersinia pseudotuberculosis, some 1,500 to 20,000 years ago [3]. Based on the biochemical assays, Devignat [1] proposed that Antiqua strains, causing probably the first known pandemic, represent the ancestor. Medievalis strains are suggested to be responsible for the second pandemic whereas the third pandemic was associated exclusively with Orientalis strains. This overall scenario, although not formally established, is supported by the observed higher diversity in the Antiqua and Medievalis biovars as measured by IS typing and the geographic origin of strains [3][4][5]. "Pestoides" strains are particular Y. pestis isolates found recently in Russia, and which infect unique species of rodents [5,6].
Several molecular methods have been used for genotyping Y. pestis strains, mostly based on pulse-field gel electrophoresis (PFGE), insertion sequence polymorphism and ribotyping. Recently, multiple locus variable-number-oftandem-repeat (VNTR) analysis (MLVA) was shown to be a promising method for genotyping a number of pathogens [7][8][9][10][11][12][13][14][15]. When applicable, MLVA is of great interest, because data produced in different laboratories can be easily exchanged and merged. This is especially relevant in the case of pathogens such as Y. pestis, for which strains cannot be easily exchanged for security reasons. In this context, the availability of standardized and easy to set-up typing tools to facilitate the research efforts on this pathogen is important. The complete genomic sequences of Y. pestis CO92 [16], biovar Orientalis, and of strain KIM [17], biovar Medievalis, have been determined, facilitating the identification of tandem repeats and consequently the selection of primers for MLVA [8,15].
Until now only small series of Y. pestis strains were typed by MLVA [8,9] and although the method seemed appropriate to genotype reproducibly and accurately, a large scale study seemed necessary. In the present work, a collection of 180 isolates of Y. pestis, from different geographical origins, and including various Y. pestis reference strains were typed using the 25 VNTRs previously described [8].

MLVA genotyping
The 25 loci could be amplified (Figure 1 and Additional file 1) in all 180 isolates (Table 1), with the exception of locus ms06 which failed to yield a PCR amplification product in three strains (corresponding to genotypes 4 and 5, Figure 2) despite numerous attempts. Sixty-one different genotypes were identified in this collection, numbered 1 to 61 ( Figure 2). Clustering analysis correctly separates Y. pestis Orientalis isolates comprising genotypes 9 to 51 from Antiqua (genotypes 2, 3 and 5 to 8) and Medievalis (genotypes 4 and 52 to 61). Antiqua strains of essentially African origin (genotypes 2, 3, 5, comprising 6 different strains) and the four Antiqua strains from Russia and Asia (genotype 6 and 7 originating from Russia [18] and genotype 8, Antiqua strains KUMA and Yokohama originating from Manchuria and Japan) are clearly separated. Within biovar Orientalis, a group of isolates among which CO92 (genotype 15) has been assigned to the IS100 typing group O1 using the PCR-based typing method described by Motin et al. [5] (data not shown; results indicated in Figure 2). This demonstrates that the MLVA clustering correlates well with another molecular typing method. Additional PCR-IS100 typing indicates that the other Orientalis strains, mostly from Vietnam, are of the O2a type ( Figure 2). This is in agreement with the report from Motin et al. [5] suggesting an association of O2a with South-East Asia. These Orientalis isolates are further separated by MLVA into three main branches comprising genotypes 27 to 51. One representative from each Orientalis genotype (genotypes 9 to 51) was assayed by PCR for the presence of the glpD deletion characterised by Motin et al. All strains yield a PCR product of the size corresponding to the deleted allele (data not shown).
Most Medievalis strains are also clustered into one major branch (genotypes 52 to 61) with the exception of the strain representing genotype 4. The Pestoides isolate from Georgia (genotype 1) is (weakly) grouped with the African Antiqua strains. Among Antiqua isolates, the KUMA and Yokohama isolates show the identical MLVA genotype 8, and possess a seemingly specific ms09 allele.
As a complementary analysis, a minimum spanning tree analysis was performed ( Figure 3). This kind of analysis is applicable to categorical data sets (see also [19]). The creation of hypothetical types (open circles) further minimizes the summed distance of all branches of the tree. The numbers indicated in the circles correspond to the genotype numbers listed in Figure 2. The Orientalis strains (genotypes 9 to 51) are closely related, and are grouped into a single complex. The O2a strains from Vietnam constitute a well-defined subgroup. The backbone of the O2a cluster is made by genotypes 28, 40, 46, 48, which altogether represent 97 isolates. The two Antiqua strains KUMA and Yokohama from Asia (genotype 8) are clearly located outside of the Orientalis group in this analysis, in a position intermediate between the Medievalis group (genotypes 52 to 61) and the Orientalis group. All strains classified as Medievalis strains based on the nitrate reductase assay fall within this group, with one exception, the strain from Kenya defining genotype 4 which is very close to Antiqua genotype 5 (two strains from Kenya and Congo). The six genotypes representing Antiqua strains (African strains, genotypes 2, 3, 5; Asian strains, genotypes 6 to 8) are very loosely connected to each other, suggesting a very high diversity of this biovar.

At least two independent types of Medievalis strains
The strain representing genotype 4 is nitrate-reductase negative, and for this reason has been phenotypically assigned to the biovar Medievalis. However, its position in Figures 2 and 3, being very close to genotype 5 (Antiqua strains) and very distinct from the Medievalis cluster, and its geographic origin from Kenya prompted us to investigate the origin of nitrate-reductase deficiency in more detail. The Medievalis strain KIM is nitrate-reductase deficient because of a single point mutation in the napA gene [17]. We have analyzed all Medievalis strains from genotypes 52 to 61 and they showed the same point mutation (data not shown). In contrast the napA gene in the strain representing genotype 4 has been inactivated by a deletion as seen by the absence of amplification of the gene in spite of the use of different primers from this locus, and the absence of an hybridisation signal in a Southern blot experiment (data not shown).
Illustration of the MLVA assay set-up Figure 1 Illustration of the MLVA assay set-up The PCR products of an amplification using primers "ms01" (on the left) or primers "ms04" (on the right) have been loaded on an agarose gel, electrophoresed, and stained with ethidium bromide. Lanes M are a 20 bp ladder molecular weight marker. Samples 1 to 7 correspond to a Yersinia pseudotuberculosis strain (lane 1), Yersinia pestis strains representing genotype 18 in Figure   . From left to right, the columns designate the genotype number, the number of isolates with an identical genotype, and the geographic origin of the isolates from each genotype. In three instances, indicated by asterisks (*), the strain origin was unknown. A number of classical strains available for this investigation are indicated in brackets. The Medievalis strain lacking the napA locus is indicated by **. O1 or O2a refers to the nomenclature defined by Motin et al. [5].

A simple MLVA assay comprising 7 markers
We have evaluated the possibility to define a smaller set of markers for routine typing and comparison of new strains with the data from the present collection. Table 2 lists the main characteristics of the 25 loci. Their name is indicated in the first column. The second column indicates when relevant the name of the corresponding marker in Klevyt-ska et al [9]. The third and fourth columns indicate the repeat size and the number of alleles in our collection of strains and that of Klevytska et al. [9]. The fifth and sixth columns indicate the size range observed in our collection of strains, and the seventh column contains the polymorphism index of each marker. Seven markers (ms01, ms04, ms06, ms07, ms46, ms62, ms70) have a polymorphism Minimum spanning tree analysis

Discussion
In the present report, an MLVA typing assay comprising 25 markers has been applied to a collection of Y. pestis isolates of various origins, but with a strong bias towards Orientalis strains from South-East Asia. One hundred and eighty strains or isolates (Table 1) have been genotyped and 61 different MLVA genotypes were identified (see Additional file 1). Clustering analysis and minimum spanning tree analysis suggest relations between the different strains and biovars which are in excellent agreement with current knowledge. In spite of the very limited number of Antiqua and Medievalis strains which could be investigated here, the data obtained suggest the existence of two groups of Antiqua strains. The first group from Russia and Asia represented by genotypes 6, 7 (from Russia) and 8 (KUMA, Yokohama) holds an intermediate position between the Medievalis and the Orientalis group. The second group comprises the African Antiqua strains. IS100 typing distinguishes the KUMA and Yokohama iso-lates which were typed as A2 and A1b type, respectively [5]. They are very similar in DNA microarrays studies [20].
Medievalis and Orientalis strains derive from Antiqua strains by the loss of metabolic functions, respectively the capacity to reduce nitrate, and to metabolize glycerol. Whereas all Orientalis strains investigated so far (and including this report) are derived from a single ancestor carrying a simple deletion in the glpD gene [5], we report here that the Medievalis phenotype can be associated with at least two independent mutation events in the napA gene. This underlines the fact that the initial biochemical tests should be complemented, or replaced, by direct molecular analyses of the glpD and napA genes.
It is tempting to speculate about possible scenarios suggested by Figure 3 and current knowledge. Pestoides strains originate from Central Asia (reviewed in [6]), and are proposed to be an outgroup of the Y. pestis group. Their genetic composition is relatively distinct from Y. pestis, in terms of plasmids and chromosome, as assayed by DNA array analysis for instance [20]. In Figure 3, the Pestoides strain studied is very distantly related to an hypothetical missing link close to the center of the figure.
Hypothetical missing links (open circles) are created by the Minimum Spanning Tree software if they result in a reduction of the total tree length. Many missing links are suggested by the software in this area. They enable the connection of the three branches made of the African Antiqua group, the Medievalis group, and the Orientalis group. The strains closest to this central position are the Antiqua strains from Asia. This and the position of the Pestoides strain suggest that all three biovars may originate from Asia. Indeed, much more strains from these regions will be needed to test this hypothesis. For instance, strain Nicholinsk51 was shown by [5] to share some features of Orientalis strains while being an Antiqua strain from Asia. The authors favored the hypothesis that this strain was a revertant from the Orientalis phenotype.
We would rather predict that Nicholinsk51 will be placed by MLVA analysis between the central group of Figure 3 and the Orientalis group.
In total, in our study and that of Klevytska et al. [9], 61 VNTRs were characterized, and still more exist in the Y. pestis genome [15] that can be tested for the selection of an optimal set for MLVA if necessary. Different questions can be addressed by different sets of tandem repeat loci. Global phylogenetic investigations will best be done using loci with a low or moderate mutation rate. Forensics, or local outbreaks investigations, may use loci with a higher mutation rate and particularly simple sequence repeats such as the tetranucleotide described by Adair et al. [21]. Mutation rate of a tandem repeat locus and phylogenetic value is very poorly predicted from sequence analysis [15,22] so that it has to be experimentally measured by typing collections of isolates, as done here (Table 2 and Additional file 1).

Conclusion
Once set-up, MLVA is a very powerful and reproducible genotyping method and it is hoped that this simple molecular tool will help unravel the molecular phylogeny of Yersinia pestis when being applied to a larger number of isolates. In comparison, MLST analysis [3] proved to be almost non informative within Y. pestis. DNA array analysis [20,23] shows a very low resolution with only a few different genotypes identified so far. IS typing by Southern blotting has a very high discriminatory power, but the resulting data is not easily comparable between different laboratories. PCR-IS typing developed by Motin et al. [5] provides exchangeable data but with a much reduced resolution as compared to classical IS typing. MLVA typing may thus turn out to be the method of choice for Y. pestis, once more isolates will have been typed, common genotype databases put together, and reference collections of markers selected. As one step in this direction, the data has been made accessible on our Web site http://bacterialgenotyping.igmors.u-psud.fr. This includes not only the full dataset which can be recovered from Additional file 1, but also the possibility to run queries with new MLVA data. This may be of use for investigators lacking the specialized tools or expertise required to run MLVA clustering analyses. A very satisfying typing resolution (but not a robust phylogenetic analysis) can readily be achieved by PCR amplification of only 7 loci.

Data management and analyses
Gel images were analyzed using the bionumerics software package version 3.5 (Applied-Maths, Sint-Martens-Latem, Belgium) as previously described [10]. The number of motifs in each allele was deduced from the amplicon size. The resulting data were analyzed with bionumerics as a character dataset. Clustering analysis was done using the categorical parameter and the Ward coefficient. The mini-mum spanning tree was constructed with the following options: (a) in case of equivalent solutions in terms of calculated distances, the priority rule used was to select the tree with the highest number of branches connecting genotypes differing at only one locus ("Highest number of single locus variants" option).; (b) the creation of hypothetical types (missing links) reducing the total length of the tree was allowed. Polymorphism indexes for each locus were calculated as 1 minus the sum of the squares of the frequency of each allele within the different genotypes identified. The data produced was made accessible from a Web page http://bacterial-genotyping.igmors.u-psud.fr as previously described [10] taking advantage of the BNServer application (Applied-Maths, Sint-Martens-Latem, Belgium).