Genotyping of Bacillus anthracis strains based on automated capillary 25-loci Multiple Locus Variable-Number Tandem Repeats Analysis

Background The genome of Bacillus anthracis, the etiological agent of anthrax, is highly monomorphic which makes differentiation between strains difficult. A Multiple Locus Variable-number tandem repeats (VNTR) Analysis (MLVA) assay based on 20 markers was previously described. It has considerable discrimination power, reproducibility, and low cost, especially since the markers proposed can be typed by agarose-gel electrophoresis. However in an emergency situation, faster genotyping and access to representative databases is necessary. Results Genotyping of B. anthracis reference strains and isolates from France and Italy was done using a 25 loci MLVA assay combining 21 previously described loci and 4 new ones. DNA was amplified in 4 multiplex PCR reactions and the length of the resulting 25 amplicons was estimated by automated capillary electrophoresis. The results were reproducible and the data were consistent with other gel based methods once differences in mobility patterns were taken into account. Some alleles previously unresolved by agarose gel electrophoresis could be resolved by capillary electrophoresis, thus further increasing the assay resolution. One particular locus, Bams30, is the result of a recombination between a 27 bp tandem repeat and a 9 bp tandem repeat. The analysis of the array illustrates the evolution process of tandem repeats. Conclusion In a crisis situation of suspected bioterrorism, standardization, speed and accuracy, together with the availability of reference typing data are important issues, as illustrated by the 2001 anthrax letters event. In this report we describe an upgrade of the previously published MLVA method for genotyping of B. anthracis and apply the method to the typing of French and Italian B. anthracis strain collections. The increased number of markers studied compared to reports using only 8 loci greatly improves the discrimination power of the technique. An Italian strain belonging to the B branch was described, and two new branches, D and E, are proposed. Owing to the upgrading achieved here, precise genotyping can now be produced either by automated capillary electrophoresis, or by the more accessible but slower and for some markers slightly less accurate agarose gel methodology.

Bams30, is the result of a recombination between a 27 bp tandem repeat and a 9 bp tandem repeat. The analysis of the array illustrates the evolution process of tandem repeats.

Conclusion:
In a crisis situation of suspected bioterrorism, standardization, speed and accuracy, together with the availability of reference typing data are important issues, as illustrated by the 2001 anthrax letters event. In this report we describe an upgrade of the previously published MLVA method for genotyping of B. anthracis and apply the method to the typing of French and Italian B. anthracis strain collections. The increased number of markers studied compared to reports using only 8 loci greatly improves the discrimination power of the technique. An Italian strain belonging to the B branch was described, and two new branches, D and E, are proposed. Owing to the upgrading achieved here, precise genotyping can now be produced either by automated capillary electrophoresis, or by the more accessible but slower and for some markers slightly less accurate agarose gel methodology.

Background
Bacillus anthracis, a spore forming Gram positive bacteria, is the etiological agent of anthrax, a zoonosis with a worldwide distribution. The disease can be transmitted to humans by contact with infected animals or contaminated animal products. In addition to being an occupational disease, limited to farmers and veterinarians, anthrax has received considerable attention and is one of the most feared microorganism with respect to bioterrorism. In 2001, letters containing spores were mailed causing the death of five persons and several cases of cutaneous anthrax [1]. B. anthracis is a member of the Bacillus cereus group, containing Bacillus cereus, Bacillus thuringiensis, Bacillus mycoides and B. anthracis [2]. B. anthracis is characterized by an extremely low genetic variability, making strains differentiation very challenging [3,4]. The importance of strain differentiation of biothreat agents including B. anthracis is increasingly recognized as a way for identifying the source of the attack, as illustrated not only by the 2001 events but also, earlier on, by the Sverdlovsk [5] and Tokyo [6] events. Such events together suggested how crucial is the development of microbial forensics for biosecurity. Measures aimed at limiting the risk of deliberate release of dangerous pathogens, would require that isolates kept in different institutions around the world are precisely genotyped, and that the genotype profiles are shared by all countries having accepted to follow these rules, for strain accountability purposes.
Because the molecular methods classically used to differentiate between strains in other species failed to discriminate B. anthracis isolates, new approaches were developed. Using Amplified Fragment Length Polymorphism in 79 B. anthracis isolates, 31 polymorphic chromosomal regions were observed out of 1,000 fragments, and eventually, the most interesting polymorphisms turned out to be the result of tandem repeat variations [7]. Tandem repeats variability was studied most extensively in eukaryotes, and was shown to be associated with different aspects of DNA replication and recombination, including replica-tion slippage and double-strand breaks repair [8]. They do not vary by the same mechanisms which are inducing point mutations, and the mutation rate at some Variable Number of Tandem Repeats (VNTR) loci may be considerably higher than nucleotide substitutions [9][10][11] in an as yet not predictable way [12,13]. The level of intraspecific polymorphism in tandemly repeated sequences varies from one locus to another and needs to be experimentally measured by typing representative strain collections. Approaches based on VNTR analysis (called MLVA for Multiple Locus VNTR Analysis) are now increasingly used to characterize in particular recently emerged, highly monomorphic pathogens including B. anthracis [12,14], Mycobacterium tuberculosis [15], Yersinia pestis [16]Brucella [17] (see [18,19] for a review). Multiple alleles can exist in the population for each tandem repeat locus. One advantage of MLVA typing as compared for instance to SNP (Single Nucleotide Polymorphism) typing is that a new isolate can be compared to previously characterised strains in an unbiased way. One weakness is that the phylogenetic value of MLVA clustering must be carefully checked and validated by using independent approaches with less homoplastic markers [20]. In previous reports, a multilocus VNTR analysis (MLVA8) method was proposed for genotyping of B. anthracis strains, using six chromosomal and two plasmid marker loci (vrrA, vrrB1, vrrB2, vrrC1, vrrC2, CG3, pXO1, pXO2 [14]). In the initial work, an automated fluorescent DNA sequencer was used to determine the size of the PCR fragments and this set of markers is now largely employed for B. anthracis genotyping [21][22][23][24][25][26]. However, this MLVA8 assay is not sufficient for molecular forensics approaches, the resolution achieved is too limited. For instance the vast majority of French and Italian strains have been assigned to MLVA8 genotypes (GT) 1 and 3 which differ only at one marker, located on one of the two virulence plasmids [14,22,26]. For this reason, additional markers were subsequently developed to produce an MLVA15 panel [11], and the initial clustering nomenclature [14] was revised [20]. Unfortunately none of the corresponding data has been published so far. Le Flèche et al. [12] extended the MLVA8 assay by proposing 14 additional markers (Bams1, 3,5,7,13,15,21,22,23,24,25,28,30,31), which greatly improved the resolution power of the assay. DNA fragment sizing was achieved on standard agarose gels stained by ethidium bromide, a technique utilising widely accessible, and very basic, equipment.
In order to have a more rapid and accurate genotyping system for B. anthracis we propose an automated capillary based method using essentially the panel of loci described by Le Flèche et al [12], with the addition of 4 new markers, and apply the method to collections of strains from Italy and France. We compare the clustering achieved with available epidemiological data for B. anthracis, and introduce two new B. anthracis groups, D and E. The present report deals with the standardization and further improvement of currently available genotyping methods for B. anthracis, the typing of a larger collection of strains, the comparison with available published data, and the enlargement of the publicly accessible databases.

MLVA25 genotyping by capillary electrophoresis
A collection of 160 B. anthracis isolates, comprising strains from Italy and France and a few reference strains were analyzed to generate MLVA typing data. DNA was amplified in 4 multiplex PCR reactions, comprising 25 loci. The loci are 21 markers described previously [12,14] (Bams7 is not currently amenable to capillary electrophoresis typing because of its allele size range and was not included), and the four new markers Bams34, 44, 51, 53 (Table 1). All twenty-five markers can be amplified in only four PCR reactions by taking into account the allele size range for each locus ( Table 2). The labelled amplicons were separated on a CEQ 8000 Beckmann DNA analysis system and the electropherograms were analysed by CEQ Fragment Analysis System software to determine the length of the fragments by reference to a 75-1000 bps size ladder (in red on Figure 1). After testing several primer concentrations and ratios, a balanced level of fluorescence peak intensity was obtained by using between 5 to 10 ng of template DNA per multiplex PCR reaction and the concentration of primers indicated ( Table 2).
The assay was highly reproducible as tested by performing duplicate typings. The intra-experimental (within the same run) as well as the inter-experimental (between runs) variation of fragment sizing showed variations lower than one base over the whole range of PCR fragments investigated here. The accuracy of the data was determined by comparison of observed PCR fragment size to the exact values calculated by direct sequencing and available genome sequence data [27]. In some instances the amplicon size determined after electrophoresis separation did not match the expected values (Table 3). This is well illustrated for instance with Bams30 (Table 3) for which the offset increases with allele size, i.e. repeat copy number. In any case, the offset is conserved, reproducible, and the differences in calculated allele size are consistent with a 9 bp repeat unit variation.
Because Bams30 was initially identified as a 27 bp repeat unit marker [12], we investigated representative Bams30 alleles by sequencing (EMBL accession numbers AM182499 to AM182509) in order to clarify this discrepancy. Sequence data indicates that Bams30 alleles are composed of 27 base-pair units at one end of the array, and 9 bp units at the other end. Table 4A provides the allele codes of Bams30 alleles. Repeat units are not identical, different letters are used to code the different variant units which are observed [28]. Uppercase letters code for 27-base-pair repeat units whereas lowercase letters code for 9 base-pair repeat unit. The correspondence between letters and repeat units is indicated in Table 4B. The two regions are variable, and as a result the locus behaves like a 9 bp repeat unit tandem repeat.
To better understand the origin of this behaviour, we investigated the structure of this locus in the closely related B. cereus-B. thuringiensis group by taking advantage of available genome sequences. Figure 2 shows the organization of the locus in the genomes of B. anthracis (accession number AE016879), B. cereus ATCC14579 (AE016877), B. cereus G9241 [29], B. cereus ATCC10987 (AE017194), B. cereus E33L (NC_006274), and B. thuringiensis CEB97/027 (AE017355) also called Bt9727. The gene bearing Bams30 in B. anthracis (Ba2450) possess a collagen-like helix repeat and is flanked by two genes Ba2449 and Ba2451. Their homologues in the other strains were depicted using the same code. The organization in B. cereus G9241 is similar to B. anthracis, but in all four other strains, the situation is less simple. In B. cereus ATCC 14579, two genes possessing a collagen-like helix repeat protein are observed, instead of one in B. anthracis: Both genes possess a tandem repeat with similarity to Bams30, encoding the collagen-like repeat. Whereas in ATCC 10987, the two tandem repeats have nine base-pair repeat units, Bt9727-2241 contains a 27 bp repeat unit presumably derived from an ancestral 9 bp unit. Bt9727 is by far the closest B. anthracis neighbor among the 5 The number of diverse allele and new allele are described for the 25 markers. Primer concentration for each set of marker in the multiplex is also reported. * HGDI (Hunter-Gaston Diversity Index) calculated as in [36] The alleles are identified with the number of repeats versus the former nomenclature [12]. Locus a , locus b , locus c are loci amplified by a forward primer respectively labeled with cy5.5, cy5 and WellRed dyeD2.
sequenced strains [30]. The composite organization of the Bams30 tandem repeat in B. anthracis can be explained by a rearrangement occurring in a Bt9727-type Bams30 locus. Strain "A1055", which is closest to the last common B. anthracis ancestor [20] shows the same rearrangement, which strengthens the validity of Bams30 as an interesting complementary tool for the identification of B. anthracis (the presence of the virulence plasmids, pXO1 and pXO2 is not sufficient since for instance "A1055" is lacking pXO1 [20]) even if the rearrangement is not strictly limited to B. anthracis as shown by the G9241 strain situation. For markers included in the MLVA8 assay [14] which could be typed on agarose (i.e. with the exception of pXO1 and pXO2) the data obtained here were in agreement with results generated by agarose gel electrophoresis [12]. Published MLVA8 [14,[21][22][23][24][25][26] data contains some allele size assignments which do not correspond to the actual size. This is due to clerical errors in the initial report [14], as noticed previously [24] and data for some loci must be converted as described in the material and methods section in order to be fully compatible.

3: Comparison between product sizes inferred by CEQ Fragment Analysis System software (observed) and the actual sizes obtained by sequencing data coming from direct sequencing of the PCR product (expected) or available in Genbank. (Continued)
Four new markers have been included. For Bams34, a 39mer repeat marker, five alleles were described ranging from 238 to 581 bps (ranging from 4 to 13 repeat units). Bams44 (39-mer) and Bams53 (12-mer) show two alleles (respectively 339/417 bps or 8 and 12 repeat units, and 212/236 bps (6 and 8 repeat units)). A third allele is predicted from genome sequence data for the "Western" strain, but was not experimentally observed among the collection investigated here. For Bams51 (45-mer) four alleles were observed ranging from 338 to 538 bps (6 to 10 units).

Data analysis
When the eight markers comprising the MLVA8 assay are used, 30 different genotypes are resolved among the 160 isolates. With only few exceptions, these genotypes correspond to previously published profiles. Most exceptions are minor variations of previous types. The vast majority of French and Italian isolates belong to GT1 and GT3 within cluster A1a [14] in agreement with previous reports [22,26]. When the whole MLVA25 dataset is used, 67 genotypes are resolved. Isolates which are not resolved in this analysis have the same epidemiological history (when known). The clustering analysis is shown in Figure 3. Although MLVA is not by its very nature a tool with a high phylogenetic value per se, especially when the available strains do not fully cover the species diversity, the clustering achieved is in excellent agreement with the known relationship between the ancestral C type (genotype 67), the B cluster (genotypes 59 to 66, with the two well-separated subgroups, B1 and B2), and the A cluster (represented on one end by the "Ames" strain, initially assigned to A3b and more recently to A2 [20], and on the other end by the "Western" strain, initially assigned to A1a and more recently to A1 [20]). More precisely, the "Vollum" strain (genotype 54; strain "9" in [20]) and the "Australia94" strain (genotype 45; presumably strain "20" in [20]) are in the expected position with respect to the A group.
Genotypes 55-56, and 57-58 are loosely connected to the others, and are suggested by clustering analysis as being intermediates between the B cluster and the A cluster. This position is reminiscent of the position occupied by strains "6", "7" and "8" investigated by Pearson et al. [20]. It may be of interest to note that "6" which branched out in an ancestral position compared to the whole "A" cluster in the SNP analysis [20], originates from Zambia, whereas genotypes 57 and 58 in Figure 3 are from Cameroon. We propose to call these very distinct MLVA groups types D and E (Figure 3).

Discussion
Bacillus anthracis genotyping is essential for epidemiological studies and for biosecurity related (including microbial forensics) issues [21]. Current methodologies such as MLVA8 are not sufficient to discriminate two geographically close isolates in natural outbreaks, since for instance most French and Italian isolates will be classified as GT1 or 3 [22,26]. Molecular characterization needs to be as precise as possible in order to discriminate isolates. The rational of this work was to update the MLVA method for quick (single-day analysis) and accurate discrimination of Bacillus anthracis isolates. The markers are stable, this is indirectly illustrated by the typing of "Sterne" strains from different origins. Genotypes 46 and 48 correspond to respectively two and five "Sterne" strains. Genotype 46 are the CIP (Collection Institut Pasteur) reference strains 7700 and 7702, genotype 48 covers "Sterne" vaccine strains from different countries, and the sequenced genome. The only difference in the two genotypes is for locus Bams28 [see additional file 1].
The number of VNTR loci was increased from 22 to 25 by addition of four new markers (and removal of Bams7, because allele sizes up to 2 kilobases are not adapted to typing on current capillary machines). The overall assay was converted from 25 monoplex PCR reactions, run on agarose gels and checked for size by eye or by using gel image analysis software, to 4 multiplex PCR analysed by automated capillary based fluorescent electrophoresis and internal size standard measurement for allele calculation. For most loci, accurate size estimate is achieved directly by the capillary electrophoresis machine. In other instances however, the sizing proposed by the machine, although quite reproducible, does not correspond to the real size. This could be due to the nature of the gel matrix, or to differences during migration between labelled PCR fragments and the ladder used, but most probably by a slightly biased flanking sequence or repeat unit-specific mobility pattern. Usually, the size estimate is shifted by a constant value (offset) but in some instances, the offset increases with increasing allele size (increasing repeat copy number). For this reason, allele numbering must be done carefully, using a correspondence table as presented here in Table 3. If different equipments, or reagents, are used, the correspondence indicated here must be checked by typing a number of reference strains (such as the widely available "Sterne" vaccine strain), before alleles can be confidently called and data can be merged.
Size measurement from multiplexed, automated PCR products allows for single-day isolate genotyping which is of importance in a crisis situation. It also provides a slightly increased discrimination power as compared to the MLVA20 assay run on agarose gels for the same analysed markers. This difference is essentially due to the lack of resolution of agarose for the larger, 9 bp repeat unit loci, such as Bams30 and for Bams3, Bams13, Bams15 and Bams31 typing, where new alleles were identified. For Bams15 and Bams30, capillary resolution was able to show a 9 bps repeat unit length variation instead of the respectively 18 bps and 27 unit length proposed initially.
The present investigation suggests that a recombination event at the Bams30 locus may have occurred prior to the emergence of B. anthracis. The event directly affects genes coding for constituents of the exosporium. Bc2381 is related to BclA, and Bc2382 corresponds to the exosporium H gene exsH [31]. In B. cereus ATCC 10987, B. thuringiensis 97/027 (Bt9727) and B. cereus genome E33L, an additional ORF is located within the region. The Bams30containing gene in B. anthracis is the result of a recombination between Bt9727-2241 (coding for the BclA-like protein) and Bt9727-2244 (exsH). The beginning and the end of the protein sequence coded by Ba2450 are respectively identical to the beginning of Bt9727-2241 and to the end of Bt9727-2244. Interestingly, a similar rearrangement is observed independently in the B. cereus strain G9241 which caused an anthrax like disease in a human patient [29]. As a result of this fusion, the first part of the tandem repeat itself contains a 27 bp repeat unit whereas the second part is a 9 bp repeat tandem repeat. It may be of interest to observe that two of the most polymorphic tandem repeats in B. anthracis, Bams13 and Bams30, are associated with genes coding for components of the exosporium (Bams13 is part of the BclA gene [32]).    thuring. ABCDEBGIIIGGGIBJ and ff**ff***fgdgfcafgh**abcdfabciefcfgfjka*

Conclusion
In this report we propose an improved automated genotyping system of B. anthracis by MLVA, extending the range of VNTR containing markers to 25, which increases the resolution power as compared to previous methods. The MLVA25 assay was used to genotype Italian and French isolates in a very fast and highly reproducible manner. We also better defined previously characterized alleles and describe new allelic variants. MLVA25 is a very accurate and discriminatory genotyping method and could be a promising technique for future epidemiological studies as well as differentiation of deliberated vs. naturally occurring outbreaks. The improved assay and resulting data remains compatible with the initial agarose gel based assay. The genotyping data can be queried on the web page MLVA web-service [33]. This site allows the comparison of local typing data to the data generated in this report.

Bacterial strains and isolates
The strains and DNA samples are from the collection maintained by the French Ministry of Defence at Centre d'Etude du Bouchet (CEB), from Istituto Superiore di Sanità (ISS) Italian collections and from the Italian Reference Center for Anthrax (ISZ, Foggia). Total genomic DNA was extracted as previously reported [30].

VNTR amplification and analysis
Twenty-five loci were used, 21 from previous studies [12,14]  A Clustering analysis of genotyping data Figure 3 A Clustering analysis of genotyping data. The 67 genotypes are numbered from 1 to 67 (left-most column). The origin of the strains (when known) is indicated; ND, unknown. "Cluster" refers to the clustering and "group" refers to the genotype (GT) number based upon MLVA8 as proposed in [14]. Genotypes with no group assigned are usually minor variations of known, neighbouring groups. A number of reference strains are indicated in ().