Multilocus variable-number tandem repeat analysis for molecular typing and phylogenetic analysis of Shigella flexneri

Background Shigella flexneri is one of the causative agents of shigellosis, a major cause of childhood mortality in developing countries. Multilocus variable-number tandem repeat (VNTR) analysis (MLVA) is a prominent subtyping method to resolve closely related bacterial isolates for investigation of disease outbreaks and provide information for establishing phylogenetic patterns among isolates. The present study aimed to develop an MLVA method for S. flexneri and the VNTR loci identified were tested on 242 S. flexneri isolates to evaluate their variability in various serotypes. The isolates were also analyzed by pulsed-field gel electrophoresis (PFGE) to compare the discriminatory power and to evaluate the usefulness of MLVA as a tool for phylogenetic analysis of S. flexneri. Results Thirty-six VNTR loci were identified by exploring the repeat sequence loci in genomic sequences of Shigella species and by testing the loci on nine isolates of different subserotypes. The VNTR loci in different serotype groups differed greatly in their variability. The discriminatory power of an MLVA assay based on four most variable VNTR loci was higher, though not significantly, than PFGE for the total isolates, a panel of 2a isolates, which were relatively diverse, and a panel of 4a/Y isolates, which were closely-related. Phylogenetic groupings based on PFGE patterns and MLVA profiles were considerably concordant. The genetic relationships among the isolates were correlated with serotypes. The phylogenetic trees constructed using PFGE patterns and MLVA profiles presented two distinct clusters for the isolates of serotype 3 and one distinct cluster for each of the serotype groups, 1a/1b/NT, 2a/2b/X/NT, 4a/Y, and 6. Isolates that had different serotypes but had closer genetic relatedness than those with the same serotype were observed between serotype Y and subserotype 4a, serotype X and subserotype 2b, subserotype 1a and 1b, and subserotype 3a and 3b. Conclusions The 36 VNTR loci identified exhibited considerably different degrees of variability among S. flexneri serotype groups. VNTR locus could be highly variable in a serotype but invariable in others. MLVA assay based on four highly variable loci could display a comparable resolving power to PFGE in discriminating isolates. MLVA is also a prominent molecular tool for phylogenetic analysis of S. flexneri; the resulting data are beneficial to establish clear clonal patterns among different serotype groups and to discern clonal groups among isolates within the same serotype. As highly variable VNTR loci could be serotype-specific, a common MLVA protocol that consists of only a small set of loci, for example four to eight loci, and that provides high resolving power to all S. flexneri serotypes may not be obtainable.

A variety of molecular typing tools have been developed to access genetic relatedness among bacterial isolates. In general, molecular markers with low variability can be used to establish phylogenetic relationships among bacterial isolates evolved over longer time spans, and highly variable markers are more useful to resolve closely related isolates for the purposes of outbreak investigation and disease surveillance. MLST, one of these molecular tools, is a sequence-based method that has been successfully applied to establish phylogenetic structure for some bacterial pathogens, such as Neisseria meningitidis and Streptococcus pneumoniae [7]. However, to public health laboratories, MLST is not sufficiently discriminative in distinguishing closely related isolates for the epidemiological investigation of clusters of infection. In contrast, pulsedfield gel electrophoresis (PFGE) is highly discriminatory for many bacterial pathogens and has been adopted as the standard typing method by an international molecular subtyping network, PulseNet International, for foodborne disease surveillance [8]. Although this method has been proven by the PulseNet laboratories to be a powerful tool for the routine subtyping of some foodborne bacterial pathogens in detecting clusters of infection, PFGE is occasionally not discriminatory enough in distinguishing some epidemiologically unrelated S. sonnei isolates. In total, PFGE is suitable to resolve closely related isolates but not an appropriate tool for establishing phylogenetic relationships between bacterial isolates that have evolved over a longer time span.
Multilocus variable-number tandem repeat (VNTR) analysis (MLVA) is prominent typing tool which has been developed for a variety of bacterial pathogens [9][10][11][12][13]. This method is based on variation in the number of repeats at multiple VNTR loci, which can be highly variable or relatively stable. Study has demonstrated that MLVA based on four to eight highly variable VNTR loci can exhibit a discriminatory power parallel to or higher than PFGE [14] and that a method based on combined loci with different variability values can be applied to establish phylogenetic relationships among strains with different evolutionary timescales [15]. In the present study, we aim to develop and evaluate a MLVA method for fine typing and phylogenetic analysis of S. flexneri isolates.

VNTR loci
In total, 36 VNTR loci were identified after testing 67 selected tandem repeat loci on nine S. flexneri isolates of various subserotypes. The locations, copy number, and sizes of amplicons for the 36 VNTRs in S. flexneri sequenced strains 301, 2457T and 8401 and the gene or encoding protein involved are listed in Table S1 (Additional file 1). Fifteen of the loci are located within pseudogenes. The 36 VNTR markers were analyzed on 242 S. flexneri isolates of various serotypes and subserotypes. The resulting data revealed that 28 of the 36 loci exhibited 100% typability (Table 1). Four loci (SF1, SF6, SF23 and SF26) had a low typability rate that was primarily attributed to the absence of PCR amplicon from 103 4a/Y isolates, which had common origin. There were, respectively, 9, 26, 19, 13 and 3 polymorphic loci detected in serotype groups 1a/1b/NT (non-typable serotype), 2a/2b/X/NT, 3a/3b, 4a/4b/Y and 6. There were three loci polymorphic in four of the five serotype groups, nine in three serotype groups, ten in two serotype groups and eleven in one serotype group. Three loci (SF20, SF28 and SF35) were not polymorphic in any of the five serotype groups. Ten loci bore alleles with a high copy number (м 6) of repeats (Table 2). Of the 10 loci, one locus (SF3) was polymorphic in four serotype groups, 2 in three serotype groups, 4 in two serotype groups and 3 in only one serotype group. Of the 10 VNTR loci, one that had low variability in the serotype groups usually bore an allele with a low copy number of repeats. SF12 was an exception; it had no polymorphism in the serotype groups 1a/1b/NT and 2a/2b/ X/NT, but it contained an allele with a high number (eight copies) of repeats.

Phylogenetic relationships established using PFGE patterns
The PFGE analysis with NotI restriction enzyme identified 95 patterns among the 242 isolates. A dendrogram generated using the PFGE patterns is shown in Figure 1. The dendrogram with detail information including isolate code, PFGE code, serotype, year of isolation, MLVA code, outbreak and origin other than Taiwan is shown in Figure  S1 (Additional file 2). Based on the serotypes and levels of genetic similarity among the isolates, six clusters (C1 to C6) and four singletons (S1 to S4) were designated. The groupings were correlated with serotypes. The clusters C1 to C6, respectively, contained isolates of 4a/Y, 3a/3b, 1a/ 1b, 2a/2b/X/NT, 6 and 3a. Isolates of serotype 3 were distributed in two distinct clusters, C2 (3a/3b) and C6 (3a). The four single singletons were respectively assigned for isolates with serotype Y, subserotype 2b, subserotype 4b, and NT. S2 (2b), originated from Egypt, was far distant from cluster C4 (2a/2b/X/NT). S4 (4b), originated from India, was also distantly related from cluster C1 (4a/Y). A serotype Y isolate and 102 subserotype 4a isolates within cluster C1 was recovered from a prolonged shigellosis out-break occurring in a long stay psychiatric center [16]; they shared 76% or higher pattern similarity. These isolates shared only 57% similarity with another subserotype 4a isolate. The isolates within cluster C4 were further classified into three subclusters (C4a, C4b and C4c). Isolates within each of the three subclusters shared 80% or higher similarity. All the imported isolates in cluster C4 fell into subcluster C4a. Subcluster C4c consisted of 10 serotype X and two subserotype 2b isolates; the serotype X and one subserotype 2b isolates were recovered from a shigellosis outbreak.  [25] b Alleles with numbers greater than 100 contain an imperfect copy due to deletion or insertion and are reported instead as the lengths (in bp) of amplicons. c Typability % = (number of isolates with PCR amplicon/number of isolates analyzed) × 100.

Phylogenetic relationships constructed using MLVA profiles
In total, 104 MLVA types were identified in the 242 isolates. A phylogenetic tree was constructed using the MLVA profiles and a minimum spanning tree (MST) algorithm. The groupings established with MLVA profiles and PFGE patterns were highly concordant, with 83.8% congruence.
On the basis of serotypes and the clusters established using PFGE data, a MLVA cluster was defined for genotypes that differed at eight loci or fewer among the 36 total loci. As a result, six MLVA clusters (MC1, MC2, MC3, MC4, MC5 and MC6) and one singleton (MS2) were designated ( Figure 2A). The groupings based on PFGE and MLVA were almost consistent, except that the serotype Y (S1) and subserotype 4b (S3) isolates were grouped into cluster MC1 and the NT (S4) isolate was grouped into cluster MC3. Cluster MC1 consisted of 107 subserotype 4a, two serotype Y and one subserotype 4b isolates. One serotype Y and 106 subserotype 4a isolates were closely related; the serotype Y isolate differed at only one locus from the predominant type for subserotype 4a isolates. The remaining subserotype 4a isolate differed at two loci from a serotype Y (S1) isolate. The subserotype 4b (S3) had a distance of at least seven loci from others within cluster MC1. Cluster MC2 consisted of two subserotype 3a and four subserotype 3b isolates; they were closely related. One subserotype 3a isolate different at only two loci from a subserotype 3b isolate. Cluster MC3 contained three subserotype 1a, eight subserotype 1b and one NT isolates. They formed three tight subclusters; one subcluster consisted of six subserotype 1b isolates, which were recovered from an outbreak. Within this cluster, one subserotype 1a isolate differed at a single locus only from a subserotype 1b isolate. Cluster MC4 consisted of 10 serotype X, 90 subserotype 2a, four subserotype 2b and one NT isolates; they were relatively divergent. The three subclusters, C4a, C4b and C4c, designated on the basis of PFGE similarity were indicated in Figure 2A. The isolates in subcluster C4a showed a considerably diverse phylogenetic pattern; they were separated by subcluster C4b. The imported isolates were distributed more closely than most of others within this subcluster. The isolates in subcluster C4b as well as those in subcluster C4c were grouped tightly. Subcluster C4c consisted of 10 serotype X and two subserotype 2b isolates; one subserotype 2b isolate displayed an identical profile with a serotype X isolate. MC6 consisted of six subserotype 3a isolates; they formed a relatively diverse phylogenetic pattern. The imported isolate had a distance of at least eight loci from others within this cluster.

Discriminatory power of PFGE and MLVA
PFGE, MLVA4 (an MLVA assay based on four most variable loci), MLVA8 and MLVA36 discriminated the total isolates into 95, 82, 90 and 104 genotypes, respectively (Table 3). More genotypes were obtained by PFGE than MLVA4 and MLVA8 but the level of discriminatory power for MLVA4 and MLVA8 was, though not significantly, higher than that for PFGE. For the panel of 90 diverse subserotype 2a isolates, MLVA4, MLVA8 and MLVA36 displayed better resolution than PFGE, though the difference of discriminatory power was not significant. PFGE, MLVA4 and MLVA36 divided the panel of 107 closelyrelated 4a/Y isolates into 19, 14 and 14 genotypes but the two MLVA assays exhibited higher discriminatory power than PFGE, though not significantly.

Subtyping of isolates from outbreaks
A total of 144 isolates recovered from eight shigellosis outbreaks were used to evaluate the usefulness of MLVA in discriminating isolates for disease outbreak investigation. From outbreak A, 102 subserotype 4a and one serotype Y isolates were recovered (Table 4), while 10 serotype X and one subserotype 2b and were recovered from outbreak K. Isolates from four outbreaks (A, B, D and K) displayed two a Number in the parenthesis denotes the amplicon size in base pairs for the allele.
or more PFGE patterns, while the isolates from outbreaks H and I shared a common PFGE pattern.
Isolates from four outbreaks (A, B, E and K) displayed two or more MLVA types ( Figure 2B). The isolates from outbreaks H and I, sharing a common PFGE pattern, had different MLVA types. The isolates from four outbreaks (C, D, H and I) displayed only one MLVA type; notably, the isolates from these outbreaks were recovered over a short time period. The 103 isolates for outbreak A, recovered from a long-stay psychiatric nursing center over a long time period (2001-2007), were chronically divergent; four loci were variable among the isolates (Table 4). Outbreaks B and K, which occurred in a military camp and a longstay psychiatric nursing center, respectively, lasted for only a few days; however, the isolates were considerably diverse. Four MLVA types were found in the six isolates from outbreak B, with variations in two loci. Five MLVA types were found in the 11 isolates from outbreak K, with variations in four loci; a distance of three loci was seen among the five types. The four isolates for outbreak E were recovered over a period of six months; the first three isolates shared the same MLVA type but the one recovered last differed at two loci from the first three.

Discussion
VNTR markers display a wide range of variability; combining VNTR loci with various variability values can be used to discern various levels of genetic relatedness between bacterial isolates. In our previous studies, we identified 26 VNTR markers for S. sonnei and evaluated the usefulness of the MLVA method as a tool for fine typing and phylogenetic analysis of bacterial isolates [14,15]. These studies indicated that MLVA based on four to eight highly variable loci is sufficient to resolve closely related isolates for disease surveillance and investigations of outbreaks. In contrast, combining loci with lower variability values are suitable to establish clear phylogenetic patterns among strains evolved over a longer time span. Theoretically, the more number of loci are used, a higher discriminatory power can be achieved and subtler phylogenetic relationships among bacterial strains can be established. Unlike S. sonnei, which has only one serotype, S. flexneri has eight serotypes with at least 12 subserotypes [3,4]. Since VNTR markers can be serotype-specific, it is necessary to explore more loci at the moment of development.
In order to identify as many VNTR markers as possible, we explored VNTR candidates from the released genomic sequences of S. flexneri and other Shigella species using the VNTRDB computer program [17]. After testing 67 VNTR candidates on nine isolates of different subserotypes, we identified 36 loci that were polymorphic among the isolates. A further evaluation of the 36 loci on a total of 242 S. flexneri isolates from various serotypes and subsero-Dendrogram for Shigella flexneri isolates Figure 1 Dendrogram for Shigella flexneri isolates. The dendrogram is constructed using PFGE patterns for 242 Shigella flexneri isolates. Clusters are designated on the basis of the level of genetic relatedness and serotypes. Isolates in each of the three subclusters of cluster C4 shared at least 80% pattern similarity. C4a types suggested that variability of some of the 36 loci could be serotype-specific. For instance, seven loci (SF12, SF16, SF19, SF23, SF25, SF29, and SF31) were not polymorphic among the diverse isolates within the 2a/2b/X/ NT group; but they were polymorphic among isolates in the 3a/3b, 4a/4b/Y and 6 groups ( Table 1). In addition, SF25 and SF31 are highly variable in the 3a/3b group. However, this observation of serotype specificity could be biased because the number of isolates tested for a serotype group was limited or may have been clonal. For the 2a/ 2b/X/NT and 3a/3b groups, the isolates were relatively divergent and more variable loci were therefore observed. For the 1a/1b/NT, 4a/4b/Y and 6 serotype groups, the number of isolates was either too small or most isolates were closely related; only a small portion of the 36 loci was observed to be variable in these groups. It is expected that more variable loci could be found for the serotypes when a larger number of diverse isolates are analyzed.
Many studies have indicated that MLVA is more powerful than PFGE in discriminating bacterial isolates [12][14] [18]; thus, it can complement or replace PFGE as a routine subtyping tool for outbreak investigation and disease surveillance. In the present study we showed that MLVA was able to discriminate isolates from two outbreaks that shared an indistinguishable PFGE pattern. For the panel of 90 diverse subserotype 2a isolates, the MLVA assays using four and eight most variable loci was able to discriminate the isolates into more genotypes than PFGE, although the discriminatory power was not significantly greater (Table 3). A similar result was observed on a panel of 107 closely related 4a/Y isolates. Although the MLVA assay using four most variable loci discriminated the 107 closely related 4a/Y isolates into fewer genotypes than PFGE, the discriminatory power for the MLVA4 assay was higher than that for PFGE. As the resolving power of MLVA to closely related isolates is primarily attributed to highly variable VNTR markers [15], an MLVA assay based on four to eight highly variable loci can display a resolving power comparable to or higher than PFGE.
VNTR markers have various degrees of variability, making MLVA as a useful molecular tool for phylogenetic study of bacterial isolates. MLVA has been applied to investigate the clonal relationships among isolates of Yersinia pestis [19], N. meningitidis [13,20] and S. sonnei [15]. In the present study, we used MLVA profiles to establish phylogenetic relationships among 242 S. flexneri isolates with various serotypes. By defining a clonal group as one that includes genotypes differing at eight loci or fewer between the two closest genotypes, the grouping based on the MLVA profiles is highly in agreement with that based on the PFGE patterns (Figure 1 and Figure 2A). The analysis based on MLVA profiles establishes clear phylogenetic patterns among different serotype groups. The isolates of serotype 3 are distributed into two distinct clonal groups, suggesting that the MLVA method developed in this study is also able to discern clonal groups among isolates within the same serotype.
The phylogenetic analysis using MLVA profiles as well as PFGE patterns also indicates that isolates with different subserotypes within a serotype can be genetically more closely related than those with the same subserotype. For example, one subserotype 1a isolate in cluster MC3 is more closely related to a subserotype 1b isolate than other subserotype 1a isolates (Figure 2A). A similar result is also observed between one subserotype 3a and one subserotype 3b isolates in cluster MC2. The analysis also indicates Phylogenetic trees for Shigella flexneri isolates that the two serotype Y isolates and the 10 serotype X isolates could be derived from strains of serotype 4 and serotype 2, respectively, whereas the two NT isolates are related to serotype 1 and serotype 2.
Recently, Gorgé et al. [21]  Although MLVA based on the 15 loci could effectively distinguish between the isolates of four Shigella species and pathogenic E. coli, grouping analysis based on the MLVA profiles was not able to distinguish among the four Shigella species. Although MLVA may not be effectively to establish phylogenetic relationships among isolates from different species of the same genus due to the specificity of VNTR markers; it can be a suitable phylogenetic analysis tool for monomorphic bacterial species, such as S. sonnei [15]. Our data indicate that the VNTR markers identified in the present study are applicable to establish clear phylogenetic relationships among serotypes groups and discerning clonal groups among isolates within a serotype.
A total of 144 isolates from eight outbreaks were analyzed using MLVA and PFGE in order to compare both genotyp-ing methods in their ability to characterize isolates for disease outbreak investigation. MLVA was able to distinguish PFGE-indistinguishable isolates from two S. flexneri 2a outbreaks. MLVA also discriminated isolates from four outbreaks into two or more genotypes. Theoretically, isolates collected from a prolonged outbreak, such as outbreak A, could have considerably been divergent, while isolates from an outbreak lasting for a short time period should have little variation. Strikingly, outbreaks B and K do not follow this pattern. These outbreaks started and ended within a few days but the isolates had showed considerably divergent. Several PFGE and MLVA genotypes were detected among the isolates for each of the outbreaks and no predominant type existed. This observation is difficult to explain. The strains could be hypervariable and mutants could emerge very quickly in response to host immunity, or the stains could have been circulating silently in the institutions for a while.

Conclusions
A total of 36 VNTR markers were identified; they exhibited different degrees of variability among various S. flexneri serotype groups. VNTR locus could be highly variable in isolates of a serotype but invariable in other serotypes. The differences could be attributed to the fact that some of the loci were serotype-specific, the isolates for a serotype group analyzed were too clonal, or the number of isolates for a serotype group was limited. MLVA based on four  highly variable loci is able to display a comparable resolving power to PFGE in discriminating isolates. MLVA is also a useful tool for phylogenetic analysis of S. flexneri; it can establish clear clonal patterns among serotype groups and also discerned clonal groups among isolates within a serotype, for example serotype 3. Isolates with different subserotypes within a serotype can be genetically more closely related than those with the same subserotype. Isolates that had different serotype but had closer genetic relatedness than those with the same serotype were observed between serotype Y and subserotype 4a, serotype X and subserotype 2b, subserotype 1a and 1b, and subserotype 3a and 3b. As highly variable VNTR loci could be serotype-specific, a common MLVA protocol that consists of four to eight loci and that provides high resolving power to all S. flexneri serotypes may not be obtainable.

PFGE
The PulseNet PFGE protocol for S. sonnei and other enterobacteria was used for PFGE analysis [22] except that 5 U of NotI instead of XbaI was used for the restriction digestion. Loci with varied sizes among the nine isolates were considered to be VNTR loci and the markers were used to further analyze a total of 242 S. flexneri isolates.

MLVA
The primer sets for PCR amplification of 36 VNTR loci are listed in Table S2 (Additional file 3). The forward primer for each primer set was labelled at its 5' end with an ABIcompatible dye, 6-FAM, NED, VIC or PET by Applied Bio-Systems (Foster City, CA, USA). Nine multiplex PCR combinations were set for the analysis. PCR reactions were performed as described above except that dye-labelled primers were used. Occasionally, no amplicon was detected for some loci by multiplex PCR; in these cases, the loci were amplified individually. If no amplicon was detected by amplification individually, DNA of the isolates prepared by a commercial kit (Geneaid, Taipei County, Taiwan) was used for amplification. The PCR products were analyzed by capillary electrophoresis on an ABI Prism 3130 Genetic Analyzer with GeneScan 500 LIZ Size Standard (cat # 4322682; Applied BioSystems) as described [14].

Data analysis
PFGE images were analyzed using the fingerprint analysis software BioNumerics version 4.5 (Applied Maths; Kortrijk, Belgium). A PFGE genotype was defined as a PFGE pattern with one or more DNA bands different from the others. A dendrogram constructed using the NotI-digested PFGE patterns was generated by the UPGMA algorithm with the Dice-predicted similarity value of two patterns at 1.0% pattern optimization and 0.8% band position tolerance. The number of repeat units for each allele was converted from the length of amplicon, saved as "Character Type" in the BioNumerics database, and subjected to cluster analysis using the MST algorithm and categorical coefficient provided in the BioNumerics software. Creation of hypothetical types (missing links) was allowed to introduce hypothetical types, group minimum size was set at 2 and maximum neighbour distance was set at 8. Alleles, which contained imperfect repeat unit(s) due to deletion or insertion, were designated by the lengths (in bp) of amplicons. To compare the discriminatory power of PFGE and MLVA with various combinations of VNTR loci, Simpson's index of diversity (D) and 95% confidence intervals (CI) were calculated according to the formulas as the described [23,24]. The polymorphism of each locus was represented by Nei's diversity index, calculated as 1-∑ (allelic frequency) 2 .