Multi-locus variable number tandem repeat analysis of 7th pandemic Vibrio cholerae

Background Seven pandemics of cholera have been recorded since 1817, with the current and ongoing pandemic affecting almost every continent. Cholera remains endemic in developing countries and is still a significant public health issue. In this study we use multilocus variable number of tandem repeats (VNTRs) analysis (MLVA) to discriminate between isolates of the 7th pandemic clone of Vibrio cholerae. Results MLVA of six VNTRs selected from previously published data distinguished 66 V. cholerae isolates collected between 1961–1999 into 60 unique MLVA profiles. Only 4 MLVA profiles consisted of more than 2 isolates. The discriminatory power was 0.995. Phylogenetic analysis showed that, except for the closely related profiles, the relationships derived from MLVA profiles were in conflict with that inferred from Single Nucleotide Polymorphism (SNP) typing. The six SNP groups share consensus VNTR patterns and two SNP groups contained isolates which differed by only one VNTR locus. Conclusions MLVA is highly discriminatory in differentiating 7th pandemic V. cholerae isolates and MLVA data was most useful in resolving the genetic relationships among isolates within groups previously defined by SNPs. Thus MLVA is best used in conjunction with SNP typing in order to best determine the evolutionary relationships among the 7th pandemic V. cholerae isolates and for longer term epidemiological typing.


Background
Diarrhoeal diseases have been and continue to be a cause of mortality and morbidity, especially in developing countries. Of particular note is the disease cholera, a severe watery diarrhoeal disease caused by Vibrio cholerae. V. cholerae is a diverse species of Gram negative bacilli. Serological testing has enabled strains of V. cholerae to be divided into over 200 serogroups based on the O-antigen present [1]. However, only the O1 and O139 serogroups have been known to cause pandemic and epidemic level disease [2].
Since 1817, seven pandemics of cholera have been recorded [3]. The ongoing epidemic started in 1961 and has affected almost every continent, particularly countries of Southeast Asia, Africa, and South America. Cholera remains endemic in developing countries and outbreaks still pose a significant public health issue [4].
The developments of DNA based typing methods have allowed epidemiological studies of cholera. Methods such as Pulse Field Gel Electrophoresis [5,6], Amplified Fragment Length Polymorphism [7] as well as population structure studies including Multi-Locus Sequence Typing [8][9][10] have all been applied to V. cholerae isolates. Such methods have all been able to distinguish between environmental and clinical strains of V. cholerae [6,8,11], but they have had limited success in drawing evolutionary relationships between 7th pandemic strains. Previously, we investigated the evolution of V. cholerae using Single Nucleotide Polymorphism (SNP) analysis and found that 7th pandemic V. cholerae isolates could be distinguished into groups with a stepwise accumulation of SNPs. The 7th pandemic SNP relationships were confirmed by a large genome sequencing based study by Mutreja et al. [12]. SNP Groups were correlated with the spread of pandemic cholera into Africa and were also able to separate the O139 isolates into a distinct SNP profile [13]. However, further resolution of isolates within each group is required.
Multilocus variable number tandem repeat analysis (MLVA) is a PCR based typing method based on regions of tandemly repeated short DNA sequence elements.
Variations in the number of copies of repeat DNA sequences form the basis of differentiation [14]. Recent studies have shown that MLVA is a highly discriminating method for the typing of environmental and clinical isolates of V. cholerae and is able to differentiate closely related isolates from outbreak situations [15,16]. In this report, we applied MLVA to isolates spanning the 7th pandemic to further determine the genetic and evolutionary relationships within the 7th pandemic clone and to evaluate the potential of MLVA as a long term epidemiological typing tool.

VNTR variation and discriminative power
The MLVA data of 61 7th pandemic isolates including its O139 derivative and 5 genome sequenced strains from Grim et al. [17] are presented as repeat numbers for each locus (Table 1). Additionally, 3 pre-7th pandemic isolates were included for comparison but were excluded from the calculation of diversity statistics below. The 66 7th pandemic isolates were distinguished into 60 MLVA profiles. All MLVA profiles were represented by a single isolate except for 4 MLVA profiles that were represented by 4, 2, 2 and 2 isolates respectively. Two of these profiles belonged to SNP group II and had allelic profile of 9-6-4-7-26-14 and 9-6-4-7-25-13. Note that an MLVA profile is made up of the repeat numbers for the following loci (in order): vc0147, vc0437, vc1457, vc1650, vca0171 and vca0283. The remaining profiles were within SNP group VI and differed at vca0171 by only one repeat, with the profiles 10-7-3-9-(22/23)-11.
The level of variation differed across the six VNTRs analysed. In total, 7, 6, 3, 5, 19 and 24 alleles were observed for vc0147, vc0437, vc1457, vc1650, vca0171 and vca0283 respectively. It is also interesting to note that the 2 most variable VNTRs are located in the small chromosome while the other 4 less variable VNTRs are on the large chromosome. Additionally, one isolate (M542) amplified two products that differed by one repeat for vc1457 which has been observed previously [16]. However, for phylogenetic analysis and scoring of alleles, only the fragment with the strongest signal was recorded. This VNTR is located within the cholera toxin subunit A promoter region which may have contributed to the decreased variation [18].
The discriminatory power of each VNTR and all 6 VNTRs combined was measured by Simpson's Index of Diversity (D). The highest D value was 0.957 and was recorded for vca0283. Except for vca0283 and vca0171, all D values were lower than previously reported. Our focus on 7th pandemic isolates which have been shown to be highly homogeneous may have contributed to these lower D values. VNTR vc1457 had the lowest D value of 0.437, which was lower than previously reported (D value = 0.58) [16]. The combined D value of 7th pandemic isolates for all 6 VNTRs in this study was 0.995. We also calculated D values from previous studies by excluding MLVA data of environmental and non-7th pandemic isolates [19][20][21][22] and found that the D values were similar and ranged from 0.962 to 0.990 [19][20][21][22], when only 7th pandemic isolates were analysed. Analysis using the two most variable VNTRs, vca0171 and vca0283, produced comparable D values, which could potentially reduce the need to use the other markers. This would be particularly useful in outbreak situations where there is limited time and resources available to type isolates. However, typing the isolates in this study using only two loci would not reveal any useful relationships.

Phylogenetic analysis using MLVA
We analysed the MLVA using eBURST [23]. Using the criteria of 5 out of 6 loci identical as definition of a clonal complex, 26 MLVA profiles were grouped into 7 clonal complexes with 37 singletons. For the 7 clonal complexes, a minimal spanning network (MSN) was constructed to show the relationships of the MLVA profiles (Figure 1 A). Many nodes in the 2 largest clonal complexes showed multiple alternative connections. There were 27 possible nodes differing by 1 locus, 4 nodes were due to the difference in vc0147 and 23 others were due to VNTR loci in chromosome II. Out of the 23 single locus difference in the 2 chromosome II VNTRs, the majority (57%) also differed by gain or loss of a single repeat unit. Thus 1 repeat change was the most frequent for the VNTRs on both chromosomes. It has been shown previously that it is more likely for a VNTR locus to differ by the gain or loss of a single repeat unit as seen in E. coli [24] and we have also found this was the case in V. cholerae. We then used the MLVA data for all 7th pandemic isolates to construct a minimal spanning tree (Additional file 1 Figure S1A). For nodes where alternative connections of equal minimal distance were present we selected the connection with priority rules in the order of: between nodes within the same SNP group, between nodes differing by 1 repeat difference and between nodes by closest geographical or temporal proximity. The majority of isolates differed by either 1 or 2 loci, which is attributable to vca0171 and vca0283 being the 2 most variable loci. It should be noted that node connections differing by more than one VNTR locus are less reliable as there were more alternatives.
Since the 2 VNTRs on chromosome II were highly variable, exclusion of these 2 VNTRs may increase the reliability of the minimum spanning tree MST (Kendall et al [21]). The number of unique MLVA profiles was reduced from 60 to 32. Nine profiles had multiple   Figure 1 eBURST analysis and minimum Spanning Networks of 7th pandemic V. cholerae isolates based on MLVA. A) MLVA using 6 VNTR loci and B) MLVA using 4 VNTR loci from chromosome I. Each circle represents a unique MLVA profile, with the isolate number/s belonging to the MLVA type within the circles. The colour of each circle denotes the group to which each isolate belongs according to Single Nucleotide Polymorphism (SNP) typing [13] (see Figure 2). Singletons are arranged by SNP groups while members of clonal complexes are connected using minimum spanning network. Thick connecting lines represent differences of one repeat unit with red lines indicating connections chosen in the minimum spanning tree shown in Additional file 1 Figure S1 based on priority rules described in the text and thin solid lines represent one locus difference with more than one repeat difference. The size of each circle reflects the number of isolates within the circle.   connections of the nodes ( Figure 1B). Using the same principle as above to resolve alternative nodes with equal minimum distance, an MST was constructed to display the relationships of these MLVA profiles and the 4 more distantly related MLVA profiles as shown in Additional file 1 Figure S1B.
A previous SNP analysis with the same isolates had shown that 7th pandemic cholera had undergone stepwise evolution [13]. None of these groups were clearly distinct from the either the 4 loci or 6 loci MLVA MST aside from SNP group VI which consists of O139 isolates ( Figure 1). However, a distinctive pattern can be seen when the consensus alleles within a SNP group are compared as shown in Table 1. We allocated a consensus allele if more than half of the MLVA profiles carried a given allele in the SNP group and if there was no consensus, the consensus allele was represented by an x for discussion below. The 2 most variable VNTRs (vca0171 and vca0283) had no consensus alleles within any of the SNP groups except vca0171 in group VI. The allelic profile that initiated the 7th pandemic was likely to be 8-6-4-7-x-x based on the allelic profiles of the prepandemic stains which is also consistent with the profile of the earliest 7th pandemic isolate M793 from Indonesia. Group I had an 8-6-4-7-x-x allelic profile which evolved into 9-6-4-7-x-x in group II. By changing the 2 nd VNTR allele from 6 to 7, groups III and IV had consensus profiles of 9-7-4-7-x-x and 9-7-4-x-20-x respectively, with the latter being most likely a 9-7-4-8-20-x profile (see Table 1). Group V had the first VNTR allele reverted back to 8 and had an 8-7-4-8-x-x profile. SNP group VI showed the most allele changes with a 10-7-3-9-23-x profile compared with 8, 7,-, 8, 21/22, 23/16 from Stine et al. [15]. Although vca0171 and vca0283 offered no group consensus alleles, it is interesting to note that the trend for vca0171 increased in the number of repeats while vca0283 decreased in the number of repeats over time (Table 1). Each SNP group was most likely to have arisen once with a single MLVA type as the founder, identical VNTR alleles between SNP groups are most likely due to reverse/parallel changes. This has also contributed to the inability of MLVA to resolve relationships. The comparison of the SNP and MLVA data allowed us to see the reverse/parallel changes of VNTR alleles within known genetically related groups. However, the rate of such changes is difficult to quantitate with the current data set.
In order to resolve isolates within the established SNP groups of the 7th pandemic, all 6 VNTR loci were used to construct a MST for each SNP profile containing more than 2 isolates. Six separate MSTs were constructed and assigned to their respective SNP profiles as shown in Figure 2. The largest VNTR difference within a SNP group was 5 loci which was seen between two sequenced strains, CIRS101 and B33. In contrast, there were several sets of MLVA profiles which differed by only one VNTR locus within the MSTs which showed that they were most closely related. The first set consisted of 5 MLVA profiles of six isolates within SNP group II, all of which were the earlier African isolates. The root of group II was M810, an Ethiopian isolate from 1970 which was consistent with previous results using AFLP [7] and SNPs [13]. However, the later African and Latin American isolates were not clearly resolved. We previously proposed that Latin American cholera originated from Africa based on SNP analysis, which was further supported by the clustering of recently sequenced strain C6706 from Peru [25]. Note that C6706 is not on Figure 2 as we cannot extract VNTR data from the incomplete genome sequence. M2314 and M830 from Peru and French Guiana were the most closely related, with 2 VNTR differences, however the remainder of isolates in this subgroup were more diverse than earlier isolates. The second set of MLVA profiles differing by one locus consisted of all O139 isolates in SNP group VI except M834, which was separated by two VNTR loci. This finding is similar to a study by Ghosh et al. [26], who found that isolates collected within a year differed at only one locus, while isolates from later years differed at more than one locus. A similar trend was also seen between closely related samples taken from the same household or same individual [21].
Isolates from SNP group V were collected from Thailand and 3 regions of Africa and contained 3 genome sequences, MJ-1236, B33 and CIRS101, from Mozambique and Bangladesh [17]. These isolates were shown to be identical based on 30 SNPs [13]. The genetic relatedness of these isolates was also reflected by their MLVA profiles, which differ by only 2 loci. The consensus alleles for SNP group V was 8, 7, 4, 8, x, x, which was identical to the consensus alleles of MLVA group I (8, 7,-, 8, x, x) according to a 5-loci study by Choi et al. [19].
No other consensus alleles of MLVA groups matched the current SNP group consensus alleles. However, there (See figure on previous page.) Figure 2 Composite tree of 7th pandemic V. cholerae isolates. Isolates were separated into six groups according to Single Nucleotide Polymorphism (SNP) typing. Isolates with identical SNP profiles were further separated using Multilocus Variable number tandem repeat Analysis (MLVA). A minimum spanning tree (MST) was constructed for each group and combined with the original parsimony tree. Numbers at the node of each between groups indicate the number of SNP differences, whereas numbers at the node of each branch within a group indicate the number of VNTR differences between isolates. were 2 isolates from Africa (M823 and M826) with the profiles 10, 6, -, 7/8, x, x from this study, which matched 2 MLVA profiles of isolates from MLVA group III Vietnam from Choi et al. [19]. These African isolates were collected in 1984 and 1990 while isolates from Choi et al. [19] were collected between 2002-2008. It is unlikely that the isolates from these two studies are epidemiologically linked. This further highlights the need for SNP analysis to resolve evolutionary relationships before MLVA can be applied for further differentiation.
Based on a 5-loci MLVA study performed by Ali et al. [27] the ancestral profile of the 2010 Haitian outbreak isolates was determined to be 8, 4, -, 6, 13, 36. Nine MLVA profiles differing by 1 locus were found in total and were mapped against our SNP study. A previous study showed that 2010 Haitian cholera outbreak strain belong to SNP group V [25]. However, based on the ancestral profile of the Haitian isolates, only the first locus was shared with our group V consensus allele and no other Haitian alleles were found in any of the group V isolates. Thus, no relationships could be made between group V isolates and the Haitian outbreak strains. Similarly, in another 5-loci MLVA study of 7th pandemic isolates sampled from 2002 to 2005 in Bangladesh [21], no MLVA profiles were found to be identical at more than 2 loci to our MLVA profiles. Therefore, while MLVA may be highly discriminatory, it may not be reliable for longer term epidemiology and evolutionary relationships. Our studies of Salmonella enterica serovar Typhi also reached a similar conclusion [28]. However, it should be noted that although our isolates are representative of the spread of the 7th cholera pandemic, our sample size is relatively small. A study with a much larger sample may be useful to affirm this conclusion.

Conclusions
We have shown that MLVA of 6 VNTR loci is highly discriminatory in differentiating closely related 7th pandemic isolates and shown that SNP groups share consensus VNTR patterns. We have also shown that relationships among isolates can only be inferred if they differ by 1 to 2 VNTRs. MLVA is best used for outbreak investigations or tracing the source of outbreaks, such as the recent outbreak in Haiti [27]. The advantage of MLVA is that there is no phylogenetic discovery bias as is the case with SNPs [13]. However, VNTRs alone are too variable to be used for longer term epidemiological studies as they were unable to resolve relationships of the isolates over a 40 year span. MLVA needs to be used in combination with SNPs for evolutionary or longer term epidemiological studies. The SNP and MLVA analyses of the Haitian outbreak and its possible Nepal origin illustrate well the usefulness of this approach [27,29].

Strain selection and DNA extraction
In total, 66 isolates of 7th pandemic V. cholerae collected between 1961 and 1999 were used in this study, including 14 isolates of the O139 Bengal biotype (Table 1). Three pre-7th pandemic isolates were also included for comparative purposes. Isolates were grown on TCBS (Oxoid) for 24 hr at 37°C and subcultured for single colonies. Genomic DNA was extracted using the phenol-chloroform method. Where available, VNTR data from sequenced V. cholerae genomes was also included in the analysis.

VNTR selection and MLVA typing
The details of 17 VNTR loci was previously identified and studied by Danin-Poleg et al. [16]. Six VNTR loci with D values >0.5 (vc0147, vc0437, vc1457, vc1650, vca0171 and vca0283) were selected and amplified by PCR using published primer sequences which were modified to include a 5' universal M13 tail as done previously [28]. An additional M13 primer with a fluorescent dye attached was added to the PCR mix to bind to the modified tail. Fluorescent dyes were FAM, VIC, NED and PET for blue, green, black and red fluorescence, respectively.
PCR conditions included a touchdown cycling profile as follows: 95°C for 5 min; 96°C for 1 min, 68°C for 5 min (−2°C/cycle, a decrease of 2°C after each cycle) and 72°C for 1 min for 5 cycles; 96°C for 1 min, 58°C for 2 min (−2°C/cycle) and 72°C for 1 min for 5 cycles; 96°C for 1 min, 50°C for 1 min and 72°C for 1 min for 25 cycles; and final extension at 72°C for 5 min.
The fluorescence labelled PCR products of vc0147 (FAM), vc0437 (VIC), vc1457 (PET), vc1650 (NED) in one sample and vca0171 (PET) and vca0283 (NED) in a second sample were pooled for capillary electrophoresis on an Automated GeneScan Analyser ABI3730 (Applied Biosystems) at the sequencing facility of the School of Biotechnology and Biomolecular Sciences, the University of New South Wales. The fragment size was determined using the LIZ600 size standard (Applied Biosystems) and analysed using GeneMapper v 3.7 software (Applied Biosystems). Sequencing was performed to confirm the number of repeats for representative alleles.

Phylogenetic analysis
A Minimum spanning tree (MST) using pairwise difference was generated using Arlequin v. 3.1, available from http://cmpg.unibe.ch/software/arlequin3, in which if alternative connections of equal distance were present, the connection between isolates with closest geographical or temporal proximity was selected. The Simpson's Index of Diversity (D value) [30] was calculated using an inhouse program, MLEECOMP package [31].

Additional file
Additional file 1: Figure S1. Minimum Spanning trees of 66 V. cholerae isolates using MLVA of A) 6 VNTR loci and B) 4 VNTR loci from chromosome I. Each circle represents a MLVA profile, with the isolate number/s belonging to the MLVA type within the circles. The colour of each circle denotes the group to which each isolate belongs according to SNP typing [12] (see Figure 2). If isolates from different SNP groups shared a MLVA profile, the circle was divided to reflect the proportion of isolates in each SNP group. Thick solid connecting lines represent differences of one repeat unit, thin solid lines and dashed lines represent 1 and 2 loci differences respectively, and longer dashed lines represent more than 2 loci differences. The size of each circle reflects the number of isolates within the circle.

Authors' contributions
Experimental work and data collection were carried out by CL. CL, SO and RL contributed to data analysis and interpretation. The study was conceived and designed by RL. The manuscript was drafted by CL and SO, and revised by PR and RL. All authors have read and approved the final manuscript.