A multi locus variable number of tandem repeat analysis (MLVA) scheme for Streptococcus agalactiae genotyping

Background Multilocus sequence typing (MLST) is currently the reference method for genotyping Streptococcus agalactiae strains, the leading cause of infectious disease in newborns and a major cause of disease in immunocompromised children and adults. We describe here a genotyping method based on multiple locus variable number of tandem repeat (VNTR) analysis (MLVA) applied to a population of S. agalactiae strains of various origins characterized by MLST and serotyping. Results We studied a collection of 186 strains isolated from humans and cattle and three reference strains (A909, NEM316 and 2603 V/R). Among 34 VNTRs, 6 polymorphic VNTRs loci were selected for use in genotyping of the bacterial population. The MLVA profile consists of a series of allele numbers, corresponding to the number of repeats at each VNTR locus. 98 MLVA genotypes were obtained compared to 51 sequences types generated by MLST. The MLVA scheme generated clusters which corresponded well to the main clonal complexes obtained by MLST. However it provided a higher discriminatory power. The diversity index obtained with MLVA was 0.960 compared to 0.881 with MLST for this population of strains. Conclusions The MLVA scheme proposed here is a rapid, cheap and easy genotyping method generating results suitable for exchange and comparison between different laboratories and for the epidemiologic surveillance of S. agalactiae and analyses of outbreaks.


Background
Streptococcus agalactiae, one of the group B streptococci (GBS), is a leading cause of bovine mastitis [1] and has been implicated in cases of invasive disease in humans since the 1960s and 1970s [2]. GBS have emerged as major pathogens in neonates [3] and in elderly adults, in whom they cause invasive infections, such as meningitis, soft tissue infections, endocarditis and osteoarticular infections [4,5]. There is a considerable body of evidence to suggest a genetic link between bovine isolates and the emerging human isolates [6,7].
GBS isolates were initially distinguished on the basis of differences in capsule polysaccharides, giving rise to 10 different serotypes [8,9]. Serotype III has been identified as a marker of late-onset neonatal disease isolates [10], but serotyping does not have sufficient discriminatory power to distinguish between isolates. Molecular methods have therefore been developed to determine the genetic relationships between isolates: multilocus enzyme electrophoresis [11], ribotyping [12], random amplified polymorphism DNA (RAPD) [13,14] and pulsed-field gel electrophoresis (PFGE) [15]. These methods make it possible to compare isolates and to define particular bacterial genogroups associated with invasive isolates in neonates. These findings were confirmed by multilocus sequence typing, as described by Jones et al. [16]. Other studies have shown that sequence type 17  isolates are associated with invasive behavior [17,18]. Two methods are currently used to explore the genetic links between isolates: PFGE * Correspondence: philippe.lanotte@univ-tours.fr 1 Université François-Rabelais de Tours, UFR de Médecine, EA 3854 « Bactéries et risque materno-foetal », Institut Fédératif de Recherche 136 « Agents Transmissibles et Infectiologie », Tours, France Full list of author information is available at the end of the article for epidemiological studies, and MLST for both epidemiological and phylogenetic studies.
Analyses of fully sequenced bacterial genomes have revealed the existence of tandemly repeated sequences varying in size, location and the type of repetition [19]. Tandem repeats (TR) consist of a direct repetition of between one and more than 200 nucleotides, which may or may not be perfectly identical, located within or between genes. Depending on the size of the unit, the TR may be defined as a microsatellite (up to 9 bp) or a minisatellite (more than 9 bp) [19]. A fraction of these repeated sequences display intraspecies polymorphism and are described as VNTRs (variable number of tandem repeats). The proportion of VNTRs in the genome varies between bacterial species. Indeed, variation in the number of repeats at particular loci is used by some bacteria as a means of rapid genomic and phenotypic adaptation to the environment [20].
A molecular typing method based on VNTRs variability has recently been developed and applied to the typing of several bacterial pathogens [19]. Multiple locus VNTR analysis, or MLVA, is a PCR-based method that was originally developed for the typing of Haemophilus influenzae [21], Mycobacterium tuberculosis [22] and two bacterial species with potential for use in bioterrorism, Bacillus anthracis and Yersinia pestis [23,24]. This method has since been shown to be useful for the genotyping of several other bacterial species causing disease in humans, including Streptococcus pneumoniae [25], Legionella pneumophila [26], Brucella [27,28], Pseudomonas aeruginosa [29] and Staphylococcus aureus [30]. This technique has several advantages. For example, in bacterial species with high levels of genetic diversity, the study of six to eight markers is sufficient for accurate discrimination between strains [26]. Highly monomorphic species, such as B. anthracis, may be typed by MLVA, but this requires the use of a larger number of markers (25 VNTRs for B. anthracis) [31]. The discriminatory power of MLVA may also be increased by adding extra panels of more polymorphic markers [28] or by sequencing repeated sequences displaying internal variability [26]. Conversely, the evaluation of differences in the number of repeats only, on the basis of MLVA, is a cheap and rapid method that is not technically demanding. The work of Radtke et al. showed relevance of MLVA for S. agalactiae genotyping [32].
Our aim in this study was to develop a MLVA scheme for the genotyping of a population of S. agalactiae strains of various origins previously characterized by MLST.

Strains
Our collection consisted of 186 epidemiologically unrelated S. agalactiae strains, isolated from humans and cattle between 1966 and 2004 in France. Five of the 152 human strains were isolated from the gastric fluid of neonates, 71 were isolated from cases of vaginal carriage, 59 were isolated from cerebrospinal fluid and 17 were isolated from cultures of blood from adults presenting confirmed endocarditis according to the modified Duke criteria [33]. The 34 bovine strains were isolated from cattle presenting clinical signs of mastitis. We also studied three reference strains: NEM316, A909 and 2603 V/R. Each strain had previously been identified on the basis of Gram-staining, colony morphology, beta-hemolysis and Lancefield group antigen determination (Slidex Strepto Kit ® , bioMérieux, Marcy l'Etoile, France). The capsular serotype was identified with the Pastorex ® rapid latex agglutination test (Bio-Rad, Hercules, USA) and by molecular serotyping, as described by Manning et al. [34]. We were unable to determine the serotype for 20 strains.

DNA extraction
The bacteria were lysed mechanically with glass beads and their genomic DNA was extracted with an Invisorb ® Spin Cell Mini kit (Invitek, Berlin, Germany).

MLST and assignment to clonal clusters
MLST was carried out as previously described [16]. Briefly, PCR was used to amplify small (≈ 500 bp) fragments from seven housekeeping genes (adhP, pheS, atr, glnA, sdhA, glcK and tkt) chosen on the basis of their chromosomal location and sequence diversity. The seven PCR products were purified and sequenced and an allele number was assigned to each fragment on the basis of its sequence. A sequence type (ST), based on the allelic profile of the seven amplicons, was assigned to each strain. The sequences of all new alleles and the composition of the new STs identified are available from http://pubmlst.org/sagalactiae/. Strains were grouped into clonal complexes (CCs) with eBURST software [35]. An eBURST clonal complex (CC) was defined as all allelic profiles sharing six identical alleles with at least one other member of the group. The term "singleton ST" refers to a ST that did not cluster into a CC.

Identification of VNTR loci
Tandem repeats were identified in the sequenced genomes of the three reference strains, NEM316, A909 and 2603 V/R, with the Microbial Tandem Repeats Database http://minisatellites.u-psud.fr [36] and the Tandem Repeats Finder program [37]. We determined the size of the repeat sequence and the number of repeat units for the three reference strains. BLAST analysis was carried out to determine whether the repeats were located within or between genes and to identify a hypothetical function for the open reading frame involved. The TR locus name was defined according to the following nomenclature: common name_size of the repeat sequen-ce_size of the amplicon for the reference strain_corresponding number of repeats (Table 1). The primers used for amplification targeted the 5' and 3' flanking regions of selected loci and matched the sequences present at these positions in the genomes of strains NEM316, A909 and 2603 V/R. We initially selected and evaluated 34 tandem repeats with repeat units of more than 9 bp in length. Some TRs were not present in all the strains, some were present in all strains and displayed no polymorphism, and others were too large for amplification in standard conditions. Six TRs were retained for this study, selected on the basis of their greater stability and discriminatory power for four of the six (Table 1).

Multiple locus VNTR analysis (MLVA)
The primers used for the VNTRs amplification are presented in Table 2. Three loci have already been described by Radtke et al. in a contemporary study but were amplified here with other primers [32] (Table 2). For the SAG7 locus, no amplification was observed with primers directly flanking the TR for 14% (26/189) of the strains. A second primer pair targeting larger consensual flanking regions was designed to confirm the absence of the locus. PCR was performed in a final volume of 25 μl containing 10 ng DNA, 1 × PCR Reaction Buffer, 2 mM MgCl 2 (Applied Biosystems), 5% DMSO (dimethyl sulfoxide), 1 unit of Taq DNA polymerase (Applied Biosystems), 200 μM of each dNTP and 0.5 μM of each flanking primer (Eurogentec, Belgium). Amplification was performed in a 2720 Thermal Cycler (Applied Biosystems) under the following conditions: initial denaturation for 5 min at 94°C, followed by 30 cycles of denaturation for 30 s at 94°C, annealing for 30 s at 50°C and elongation for 60 s at 72°C plus a final elongation step for 7 min at 72°C. We separated 10 μl of PCR product by electrophoresis in a 2% agarose gel (Eurogentec, Belgium), which was also loaded with a 100 bp DNA size ladder (New England BioLabs). Electrophoresis was performed in 20 cm-long gels, in 1× TBE buffer (89 mM Tris-Borate, 2.5 mM EDTA) containing 1 μg/ml ethidium bromide run at 10 V/cm. In each run, at least one lane was loaded with PCR product from one of the reference strains, NEM316, A909 or 2603 V/R. The gels were photographed under ultraviolet illumination, with Vision-Capt ® Software (Vilber-Lourmat, Marne la Vallée, France). The number of repeats for each VNTR was deduced from amplicon size, by comparison with the reference strain, for which the number of repeats was known. The allele number corresponded to the number of repeats. For the SAG7 locus, the lack of a VNTR was revealed by the absence of amplification with the first primer pair and the amplification of a fragment of the expected size with the second primer pair, which targeted larger consensual flanking regions. In this case, an allele number of 0 was given. For the SAG21 locus, a , 2603 V/R, A909 and NEM316 number of repeats (-: lack of VNTR) 5 , Expected size of PCR product for the A909 reference strain 6 , HGDI: Hunter and Gaston's diversity index, 95% confidence intervals are noted in brackets 7, 69 bp upstream from the ribosomal protein S10 sequence *, Locus name described by Radtke 117 bp PCR product was obtained, demonstrating deletion of the inserted sequence and, thus, the absence of a VNTR. An allele number of 0 was also assigned in this case. The MLVA genotype of a strain was expressed as its allelic profile, corresponding to the number of repeats at each VNTR, listed in the order SAG2, SAG3, SAG4, SAG7, SAG21, SAG22.

Data analysis
The polymorphism index of individual or combined VNTR loci was calculated with the Hunter-Gaston diversity index [38], an application of Simpson's index of diversity [39]. Confidence intervals (CI) were calculated as described by Grundmann et al. [40]. The categorical coefficient (also called Hamming's distance) and unweighted pair group method with arithmetic mean (UPGMA) clustering approaches were run within Bio-Numerics. A cutoff value of 50% similarity was applied to define MLVA clusters. The minimum spanning tree (MST) was generated with BioNumerics. Each circle represents an MLVA genotype and its size is proportional to the number of strains. A logarithmic scale was used when drawing branches. The thicker branches link the MLVA genotypes differing by only one allele, the thinner branches link MLVA genotypes differing by more than one allele.

MLST genotyping
MLST was performed on the 189 S. agalactiae strains, identifying a total of 51 individual STs. Eburst analysis clustered the STs into five clonal complexes (CC17, CC19, CC10, CC23 and CC7), two groups with only two STs and six singletons ( Table 3). Two of the CCs -CC17 (73 strains) and CC19 (63 strains)accounted for 72% (136/189) of the strains. CC23 accounted for 8% (15/189) of the strains. The various serotypes of S. agalactiae were distributed between multiple CCs and singleton STs. STs were characterized by a predominant serotype: serotype V in ST-1, serotype III in ST-17 and ST-19, serotype Ib in ST-10 and ST-12. ST-23 contained two serotypes (serotype Ia and III; Table 3). The population was therefore representative of S. agalactiae diversity in terms of anatomic origin, serotypes and clonal complexes ( Table 3).

Description of the MLVA scheme
The six VNTRs were amplified from all 189 strains. MLVA was carried out with individual PCRs and agarose gel electrophoresis of the amplicons, as shown in Figure 1, for a subset of VNTRs. The repeat unit size of the six VNTRs was between 18 bp and 159 bp, making it straightforward to estimate the size of amplicons on agarose gels. For SAG2, SAG3, SAG4 and SAG7, amplicons were between 114 and 573 bp in size and were readily resolved by 2% agarose gel electrophoresis (Table  1). For SAG21 (48 bp repeat unit) and SAG22 (159 bp repeat unit), few amplicons exceeded 1,000 bp and extensive electrophoretic separation was required for precise estimations of size. For SAG21, three strains gave rise to amplicons of more than 1500 bp in size. This made it difficult to assess the number of repeats with any degree of precision, and an arbitrary allele number of > 30 was assigned in these cases. For SAG7, no amplification with the first primer pair was observed for 14% of strains. This locus is part of a genomic island and a second primer pair targeting consensual flanking regions beyond the borders of this genomic island was designed to confirm the deletion of the VNTR locus. The number of alleles was between two for SAG3 and 26 for SAG21. Thus, this MLVA method combined markers with a low discriminatory power (Hunter and Gaston's index of diversity or HGDI < 0.5) with highly discriminant markers, such as SAG21. With the exception of SAG2, the VNTRs used in this MLVA method were located within open reading frames (Table 1). SAG2 is located upstream from the gene encoding the ribosomal protein S10; SAG3 is located within dnaJ, encoding a co-chaperone protein (Hsp40). SAG21 is located within fbsA, encoding a protein involved in adhesion. SAG4, SAG7 and SAG22 are located in a "predicted coding region" of unknown function.

Comparison of MLVA and MLST clustering
MLVA clustering showed a clonal distribution of the population similar to that obtained by MLST ( Figure 4). All human strains of MLST CC17 clustered together in MLVA cluster 9 and the bovine strains of MLST CC17 belonged to several MLVA clusters, suggesting greater heterogeneity of this population ( Figure 4)

Discussion
In this study, we applied the multi locus VNTR analysis (MLVA) typing method to S. agalactiae. VNTR analysis, a method based on tandem repeat polymorphisms at multiple loci, has been successfully applied to many other bacterial species [30,41]. We investigated the relevance of this tool for the genotyping of S. agalactiae, by testing this method on six VNTR loci in 189 strains previously characterized by MLST and serotyping. The MLVA-6 scheme is inexpensive and can be carried out with the equipment routinely used for PCR amplification and agarose gel electrophoresis. For the six VNTR loci, amplification was achieved with all the strains tested. For SAG7, a second PCR targeting a larger flanking region was required for 14% of the strains, which did not have a 16 kb genomic island encompassing the VNTR. The repeat sizes of the six VNTRs were sufficiently large for evaluation of the number of repeats on agarose gels. Moreover, the conversion of results into allelic profiles should make it possible to construct databases for exchange between laboratories. The MLVA-6 scheme includes a set of markers with different diversity indices, making it suitable for epidemiological studies. Markers with a moderate diversity and small number of alleles (presumably reflecting their slow rate of evolution) define clusters, whereas markers displaying more rapid evolution reflect variability within clusters. The MLVA-6 method described here is a rapid, reproducible and epidemiologically meaningful typing tool. Three loci studied in the present MLVA scheme are in common with the MLVA scheme proposed by Radtke et al. [32]. The 3 additional loci studied here provide more weight to clusters while maintaining a high discrimination power. Moreover, in the MLVA scheme proposed here, only one locus (SAG7) was missing in some strains (14%), and another primer pair targeting larger consensual flanking region confirmed the absence of this locus with a specific amplification. Unlike Radtke et al., we sought to develop a MLVA scheme in which a PCR product was amplified in all strains whether the VNTR was present or absent. In fact, negative amplification may result from the lack of a VNTR locus or modification of the flanking regions, especially as some VNTRs are close to transposases or insertion sequences such as SAG4 (alias SATR1) which is close to IS1381. Thus, the possibility of negative amplification for 3 out of 5 VNTR loci in the Radtke et al. MLVA analysis could be a real problem in terms of resolution and reproducibility of the genotyping method. Nevertheless, cumulative works allow to define the best set of VNTR loci, as has already been done for other bacterial species such as Mycobacterium tuberculosis [22,[42][43][44][45][46] and Staphylococcus aureus [30,[47][48][49]. Finally, the study of 34 isolates of bovine origin provided information about their distribution, especially those belonging to MLST CC17.
Population analysis by MLVA revealed a clonal distribution of the strains similar to that obtained by MLST. The greater discriminatory index of MLVA (0.96) made it possible to distinguish between strains within the clonal complexes defined by MLST. Thus, MLVA divided CC23 into two groups: one associated with serotype III and the other associated with serotype Ia. Moreover, MLVA also separated CC17 into two groups: one corresponding to strains of human origin and the other, containing several related STs (ST-61, ST-64, ST-301 etc.), corresponding to strains of animal origin only. A previous study analyzing 75 strains of S. agalactiae of human and animal origin by whole-genome DNA-array hybridization also separated ST-23 strains into two clusters, one associated with serotype III and the other with serotype Ia [50]. Each of these two clusters was associated with a particular pattern of surface protein expression. This previous study also separated the bovine and human CC17 strains [50]. These results are   Ib   II   III   III   III   III   III   III   III   III   III   III   Ib   III   Ib   II   V   V   NT   III   III   II   II   II    consistent with an ancient divergence of these clusters, whereas other observations based on MLST analysis suggest that ST-17 strains may have arisen from a bovine ancestor [6]. The lack of a strict correlation between the results of MLST and MLVA may be accounted for by differences in the markers used for MLST (targeting housekeeping genes) and MLVA (targeting a set of diverse regions that may or may not be MLVA cluster 7   MLVA cluster 8   MLVA cluster 9   3  3  1  5  0  2   3  3  3  5  0  2   3  3  3  5  0  2   3  3  3  5  0  2   3  3  3  5  0  2   3  3  3  5  0  2   3  3  3  5  0  2   3  3  3  5  0  2   3  3  3  5  0  2   3  3  3  5  0  2   3  3  3  5  0  2   3  3  3  5  0  2   3  3  3  5  0  4   3  3  2  5  6  2   3  3  2  5  6  2   3  3  3  5  6  2   3  3  3  5  6  2   3  3  3  5  6  2   3  3  3  5  6  2   3  3  3  5  6  2   3  3  3  5 II   II   II   III   III   III   III   III   III   V   III   III   III   III   III   III   III   III   III   III   III   II   III   III   III   V   NT   III   NT   III   NT   III   II   II   Ia   Ia   Ia   Ia   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   III   NT   III   III   III   III   III   III   III   g  conserved). Unlike MLST, MLVA targets several types of markers: genes involved in metabolism, genes associated with virulence and a genomic island. Indeed, SAG2 is located upstream from the gene encoding the ribosomal protein S10 which is involved in transcription and translation, and SAG3 is located within dnaJ, which encodes a member of the Hsp70 family, a co-chaperone protein (Hsp40). The SAG21 locus encodes a surface protein involved in virulence, FbsA. The SAG7 locus is located on a genomic island and belongs to a gene encoding a hypothetical protein whose function has not yet been identified, like most of the genes of genomic islands [51]. Clustering based on MLVA data was almost identical with the UPGMA and MST algorithms except for cluster 1. The differences in mathematical calculation between the two methods may account for the observed differences in strain clustering. This phenomenom has been previously observed in MLVA studies [52]. Some VNTRs for the alpha C protein have already been described in S. agalactiae [41,53,54]. One of these VNTRs is involved in regulating gene expression: a pentanucleotide repeat located upstream from the promoter regulates expression in vitro by phase variation. Another is an intragenic VNTR that modifies the size of the alpha C protein, thereby altering its antigenicity and strain virulence [53]. These two VNTR loci were not included in the MLVA method proposed here, in one case because the small size of the repeat unit (5 bp) complicates the mode of PCR fragment size assessment [19]. The amplicons of the second VNTR locus not included were more than 2000 bp in size, again making it difficult to evaluate repeat number. Tandem repeats were also found in the gene encoding another surface protein, FbsA, which interacts with epithelial cells and is involved in invasion of the central nervous system of colonized neonates. Its ability to bind to fibrinogen depends on the number of repeats of a unit of 16 amino acids present at its N-terminus [55]. A particular number of repeats is associated with the greater potential of the ST-17 strains implicated in neonatal meningitis to adhere to fibrinogen [56]. This major marker was included in our MLVA method and corresponds to SAG21.

Conclusions
The MLVA method proposed here is a simple genotyping method producing results that can be exchanged between laboratories. MLVA generated major clusters that corresponded well to the main clonal complexes obtained by MLST. However its discriminatory power provided was greater that that of MLST. MLVA could also therefore be used as an epidemiological tool, given its high discriminatory power, making it possible to distinguish between strains of homogenous lineages. The specificities of the VNTRs for each phylogenetic lineage raise questions about the role of VNTRs in the adaptation of S. agalactiae to its environment and in virulence. Further studies are required to clarify these issues.