Evaluation and selection of tandem repeat loci for a Brucella MLVA typing assay

Background The classification of Brucella into species and biovars relies on phenotypic characteristics and sometimes raises difficulties in the interpretation of the results due to an absence of standardization of the typing reagents. In addition, the resolution of this biotyping is moderate and requires the manipulation of the living agent. More efficient DNA-based methods are needed, and this work explores the suitability of multiple locus variable number tandem repeats analysis (MLVA) for both typing and species identification. Results Eighty tandem repeat loci predicted to be polymorphic by genome sequence analysis of three available Brucella genome sequences were tested for polymorphism by genotyping 21 Brucella strains (18 reference strains representing the six 'classical' species and all biovars as well as 3 marine mammal strains currently recognized as members of two new species). The MLVA data efficiently cluster the strains as expected according to their species and biovar. For practical use, a subset of 15 loci preserving this clustering was selected and applied to the typing of 236 isolates. Using this MLVA-15 assay, the clusters generated correspond to the classical biotyping scheme of Brucella spp. The 15 markers have been divided into two groups, one comprising 8 user-friendly minisatellite markers with a good species identification capability (panel 1) and another complementary group of 7 microsatellite markers with higher discriminatory power (panel 2). Conclusion The MLVA-15 assay can be applied to large collections of Brucella strains with automated or manual procedures, and can be proposed as a complement, or even a substitute, of classical biotyping methods. This is facilitated by the fact that MLVA is based on non-infectious material (DNA) whereas the biotyping procedure itself requires the manipulation of the living agent. The data produced can be queried on a dedicated MLVA web service site.


Background
Brucellosis is a zoonosis affecting animals and humans worldwide. Brucella infections may result in significant economic losses due to abortion and slaughtering of infected animals. Humans are mainly infected through the consumption of contaminated dairy products or by direct contact with infected animals. In addition, certain Brucella spp have to be considered as potential biowarfare agents. Six species are currently recognized, B. abortus (8 biovars), B. melitensis (3 biovars), B. suis (5 biovars), B. ovis, B. canis and B. neotomae [1]. More recently, Brucella strains have been isolated from marine mammals [2], suggesting the existence of additional species [3,4].
The genus Brucella is highly homogeneous (more than 90% DNA/DNA homology [5]). Brucella classification is mainly based on differences in pathogenicity, host preferences, and conventional microbiological tests used for phenotyping (biotyping) [6]. Routine identification of Brucella species and biovars still relies on biotyping (reviewed in [7]). Only a few tools exist for further molecular subtyping, of which none has proven to be fully satisfactory for epidemiologic investigations or tracing back strains to their origin. Tandem repeat (TR) sequences may be an interesting class of markers, since multiple alleles can be present at a single locus, and size differences are easily resolved by electrophoresis (reviewed by [8,9]). Tandem repeats are often classified as microsatellites (repeat units up to 8 bp) and minisatellites [10,11]. Tandem repeat typing has proven to be highly appropriate for the typing of pathogenic bacterial species with a high genetic homogeneity, including the Mycobacterium tuberculosis complex, Bacillus anthracis, and Yersinia pestis [12][13][14][15]. Recently, a family of tandem repeats located within a repeated sequence and present in multiple loci in the Brucella genome was used for strain typing [16,17]. The proposed set of eight microsatellite loci is extremely discriminant and highly efficient to distinguish strains within a local outbreak, but is unable to correctly predict the biovar or even the species of an isolate. A possible reason for that is the high mutation rate of these loci. Consequently, this MLVA assay cannot replace classical biotyping methods.
The availability of the whole genome sequences of B. melitensis 16 M, B. suis 1330 and B. abortus strain 9-941 [18][19][20] greatly facilitates the search for polymorphic DNA sequences [21]. In this report, we evaluated most tandem repeats showing at least two alleles among the three sequenced strains [22]. Eighteen reference strains and 3 strains isolated from marine mammals [23] were typed using these TR candidates to evaluate their associated polymorphism. For routine typing, a subset of 15 markers which enabled to cluster the isolates according to their biotype was selected. This set of markers was further eval-uated on a collection of 236 isolates representing the major biovars affecting terrestrial mammals (Table 1) to produce a first reference data set [see Additional file 1] which can be queried via the internet [21,24].

Evaluation of tandem repeats polymorphism
Comparison of the three genome sequences [21,22] identifies 107 TRs with a repeat unit larger than 5 bp and predicted to display size polymorphism. Eighty of them were evaluated for polymorphism among 21 reference and marine mammal strains (Table 1). Twenty-two TRs (numbered Bruce01 to Bruce22 in Table 2) have three predicted alleles. Twelve of the 22 are octamers, five of which have been previously characterized [16].
Typing was done by PCR using the set of primers listed in Table 2, as described [13]. Six markers failed to amplify DNA satisfactorily, and were not included in the further study: they generated multiple band profiles (bruce20-BRU329_8bp_148bp_7u; bruce38-BRU1116_18bp_108bp_2u; bruce71-BRU337_12bp_394bp_3u), or lacked amplification using the selected primers (bruce79-BRU163_12bp_141bp_4u), or no appropriate primers could be designed targeting the flanking regions because of the presence of repeated elements (bruce76-BRU243_21bp_2u; bruce77-BRU195_21bp_2u, not listed in Table 2).
Three markers (bruce44-BRU256_12bp_110bp_3u; bruce65-BRU824_41bp_182bp_2u; bruce69-BRU488_57bp_181bp_1u) turned out to be monomorphic for the 21 reference strains. The results of the clustering analysis using the 71 remaining markers fits very well with the current knowledge of the degree of relationship between Brucella species [25] (Figure 1). We then looked for a subset of markers providing a similar discriminative power as the whole set for the collection of reference strains evaluated. Although extremely informative, the family of octamers, which includes the eight tandem repeats previously investigated [16,17], are not appropriate for species/biovar discrimination because of their hypervariability and more stable markers must be used. Among the other markers, a set of the ten most polymorphic loci clusters the different species as expected. Two of these ten markers display allele size ranges not appropriate for analysis on currently available automated DNA fragments sizing machines such as capillary electrophoresis sequencing machines (Bruce02 and Bruce15 have alleles up to 2 kb and 5 kb respectively). The amplification patterns of the 21 reference strains using the other eight TRs are shown in Figure 2. These 8 markers (Bruce06, 08, 11,12,42,43,45, 55) will subsequently be called MLVA typing panel 1. These are minisatellites loci with repeat

Evaluation of a MLVA assay comprising 15 markers
The set of 15 TR markers (panel 1 and 2, listed with one or two asterisk in Table 2

Discussion
The genus Brucella has been divided into species and biovars for a long time, but this classification has been discussed controversially since DNA-DNA hybridization has been applied. The genus proved to be highly monomorphic with a level of relatedness among all species higher than 90% [5]. This homogeneity complicated the development of molecular assays able to efficiently recognise the species-specific entities. This finding led to the proposal of a monospecies genus, i.e. B. melitensis. The classical species would be considered as biovars only. However, most bacteriologists did not accept this concept which has recently been rejected by the subcommittee of taxonomy [26]. The purpose of the present study was firstly to investigate the polymorphism of tandem repeat loci predicted to be polymorphic by comparing the data of the three different Brucella strains already sequenced and secondly to evaluate to which extend tandem repeat typing and classical biotyping clustering fit together. We evaluated most of these loci with a repeat unit of 5 bp or more.
Polymorphism has been confirmed at 71 loci. DNA was amplified at every locus from all 21 reference strains, including the 3 marine mammal strains (except for         Clustering analysis in 257 strains and isolates with the two panels of markers (MLVA-15), genotypes 140 to 204 Figure 5 The columns content is as indicated in Figure 3 legend. The corresponding genotyping data can be found in the additional file [see Additional file 1]. The third part of the clustering (genotypes 140 to 204) is displayed, comprising the rest of B. abortus isolates, and the B. melitensis isolates. The RB51 vaccine strain tested here is genotype 159, S19 is genotype 161, and a number of Rev1 isolates share genotype 201. The colour code used is as shown in Figure 1.   Bruce04 in the B. melitensis bv 3 reference strain Ether and Bruce01 in the B.ovis reference strain BOW63/290) confirming the very high genetic homogeneity of the genus Brucella.
A MLVA typing assay depends on the selection of markers which individually would not provide a relevant clustering. Taken separately, the TR markers are either not informative enough, or too variable or show a high level of homoplasy. However, the combination of well selected independent loci may be highly discriminatory and to some extend phylogenetically relevant, as shown previously for other species [9], and demonstrated here for Brucella. We propose a selection of 15 markers to be used in a Brucella MLVA assay consisting of two complementary panels, panel 1 (8 markers) and panel 2 (7 markers). The fifteen markers are a combination of moderately variable (minisatellites, panel 1) and highly discriminant (microsatellites, panel 2) loci ( Table 2).
The strain clustering achieved is consistent with wellestablished phenotypic and molecular characteristics (Figure 3, 4 and 5). The biovars 1, 2 and 4 of B. abortus are gathered in agreement with (i) the sensitivity to thionin and (ii) the PCR-RFLP pattern of the omp2a genes specific for these biovars [27]. B. abortus biovar 3 strains are found in a separate group except for 2 strains originated from Africa (BCCN 93-26 and the reference strain Tulya). Strains isolated in Africa often show distinct phenotypes [28] and thus, it is not surprising to find these two strains separated. The two strains do not require CO 2 for growing. Their MLVA closest neighbours are two B. abortus biovar 6 strains also isolated in Africa. Assignment to biovar 3 or 6 reflects the H 2 S production which is the unique phenotypical criteria to differentiate these two biovars. The MLVA assay confirms that some African strains significantly differ from isolates of other origin and that B. abortus biovar 3 is a heterogeneous group.
The B. melitensis group is very heterogeneous using either panel 1 or both panels (MLVA-15), and comprises four main subgroups. Biovar 2 and 3 strains are mixed in two groups, together with a few biovar 1 strains. The other biovar 1 isolates form 2 groups, one including the 16M reference strain, and the other (genotypes 173 and 174, Figure  5) comprising 3 isolates from the United Arab Emirates. B. melitensis BCCN 84-3 strain (MLVA-15 genotype 20) is an isolate from a dog in Costa Rica, which was biotyped as B. melitensis biovar 2, but appears to be distantly related to other B. melitensis strains. This strain is smooth as observed by the agglutination with anti-A serum, and the profile obtained in oxidative metabolism is typical of B. melitensis. Panel 1 analysis (not shown) does associate this strain with B. melitensis, but the full MLVA-15 analysis suggests a position closer to the B. canis group (Figure 3).
B. suis strains are clearly differentiated in three groups (Figures 3 and 4). A first group includes all biovar 1, 3, and 4 strains, and a second group all biovar 2 strains. The two rare biovar 5 strains are very distantly related. The correlation with biovars is good with some interesting exceptions. The five B. suis biovar 3 isolates from Croatia have the same genotype (MLVA-15 genotype 36, Figure 3 [see Additional file 1]), and cluster with B. suis biovar 1 strains but not with the reference B. suis biovar 3 strain. More B. suis strains phenotypically identified as biovar 3 from other geographic origins are required. This may suggest that the biovar 3 phenotype may have appeared independently more than once. Biovar 1 and biovar 3 strains are distinguished by sensitivity to fuchsine and ability to produce H 2 S. Atypical fuchsine-resistant biovar 1 strains have already been described [6], as well as atypical fuchsine-sensitive B. melitensis strains [29,30]. So both the fuchsine sensitivity, and the H 2 S production (as suggested above for B. abortus) may appear to be phylogenetically weak markers with some degree of homoplasy. Among biovar 2, strains isolated from Spain and Portugal are related and can be distinguished from other European strains investigated. Biovar 4 strains can be found right beside B. canis. Meyer [31] has previously proposed a model for evolutionary derivation of Brucella organisms on the basis of phenotypic characteristics and proposed a close relationship between B. suis biovar 3/4, and B. canis. PCR-RFLP analyses of the porin genes are in agreement with this finding [27].
Three classical vaccine strains were included, Rev.1 (genotype 201), S19 (genotype 161) and RB51 (genotype 159). Six other isolates, from Israel, share genotype 201. These streptomycin resistant isolates were confirmed as Rev.1 vaccine strains using the previously described assay [32] (data not shown). This is not unexpected since vaccination is used in this country, and simply illustrates the stability of the MLVA assay in the present case.
In almost all such instances where the MLVA genotype of two isolates is identical, the available epidemiological data is indeed compatible with a common source of infection. The rare exceptions would then suggest that some strains travel efficiently. MLVA-15 genotype 132 was observed in Germany in 1972 and in the centre of France (department 87) in 1994. MLVA-15 genotype 1 (B. canis) was observed in Greece and Germany. More epidemiological data will be needed in order to draw precise conclusions on the circulation of the strains.
The MLVA-15 results support the current classification of the genus Brucella. In addition, differences found by phenotypic identification and/or by molecular studies are also detected by MLVA. One major advantage of MLVA is the ease of data exchanges. The data itself can be summarized by a very simple flat text file containing the repeat copy numbers for each locus and each strain. This data can also be made accessible and queried across the internet as shown [21,24].
Another advantage is that MLVA typing only depends on the measurement of DNA amplicon sizes, so that a number of electrophoretic techniques can be used, ranging from manual, low-cost, agarose gels, to high-throughput capillary electrophoresis sequencing machines.
In the near future, it is tempting to speculate that international databases containing MLVA data of thousands of strains will be produced, and MLVA will become a routine assay for any new isolate. We believe that the MLVA-15 assay will be one step in this direction. A first use of the assay for a clinical application was recently described [33].

Bacterial strains
The 257 strains and isolates used for MLVA typing are listed or described globally in Table 1. One hundred and seventeen B. suis, 43 B. melitensis, 52 B. abortus, 24 B. ovis, one B. neotomae, 17 B. canis and 3 strains isolated from marine mammals [2] were investigated. This collection includes the 18 classical reference strains representing the different species and biovars of Brucella. All strains were mainly isolated from animals and in a few cases from humans or unknown species (Figure 3, 4 and 5), and were identified by phenotypical tests based on agglutination with monospecific antisera (serotyping), phage typing, dye sensitivity, CO 2 requirement and H 2 S production [6].

Identification of variable number tandem repeats by genomic sequence comparison
The methods previously described [10,12,21,22] and the genome sequence data for B. suis strain 1330, B. melitensis strain 16 M and B. abortus strain 9-941 [18][19][20] were used to identify TRs that may help to differentiate closely related genomes.
The different TRs are designated by using the nomenclature previously described [13]. For instance BRU211_63bp_257bp_2u (bruce11) is a TR at position 211 kb in the B. melitensis 16 M genome. Its common laboratory name (alias name) is Bruce11. It has a 63 bp motif, and a total PCR product length of 257 bp in the B. melitensis 16 M strain when using the primer set indicated in Table 2. This allele size corresponds to 2 units.

PCR amplification and genotyping
Brucella DNA was prepared as previously described [27]. PCR amplification was performed in a total volume of 15 µl containing 1ng of DNA, 1× PCR Reaction Buffer, 1 U of Taq DNA polymerase (Qbiogen, Illkirch, France), 200 µM of each deoxynucleotide triphosphate, and 0.3 µM of each flanking primer as described previously [15].
Amplifications were performed in a MJ Research PTC200 thermocycler. An initial denaturation step at 96°C for 5 minutes was followed by 30 cycles of denaturation at 96°C for 30 s, primer annealing at 60°C for 30 s, and elongation at 70°C for 1 min. The final extension step was performed at 70°C for 5 min.
Two to five microliters of the amplification product were loaded on a 3% standard agarose gel for analyzing tandem repeats with a unit length shorter than 10 bp and on a 2% standard agarose gel for all others, and run under a voltage of 8 V/cm until the bromophenol blue dye had reached the 20 cm position. Gels were stained with ethidium bromide, visualized under UV light, and photographed (Vilber Lourmat, Marnes-la-Vallée, France). A 100-bp and a 20-bp ladder (EZ Load 100 pb or 20 bp PCR Molecular Ruler, Biorad, Marnes-la-Coquette, France) were used as molecular size markers depending on the tandem repeat unit length. Gel images were managed using the Bionumerics software package (version 4.0, Applied-Maths, Belgium).

Data analysis
Band size estimates were converted to a number of units within a character dataset using Bionumerics version 4.0 (Applied-Maths, Belgium) [see Additional file 1]. Clustering analyses used the categorical coefficient and UPGMA (unweighted pair group method using arithmetic averages). The use of the categorical parameter implies that the character states are considered unordered. The same weight is given to a large or a small number of differences in the number of repeats at each locus. Maximum parsimony was done using Bionumerics, running 200 bootstrap simulations and treating the data as categorical.

Authors' contributions
MG, IJ, SAD, KN, HN were in charge of strain selection, collection and checking of related data, preparation and provision of DNAs. PLF did the MLVA genotyping work. GV was in charge of the Bionumerics database, error checking, clustering analyses. FD and PB did the genome sequence analyses for polymorphic tandem repeat searches and the genotyping page. GV wrote the report. IJ and MG helped to draft the manuscript. All authors read, commented and approved the final manuscript.