A novel multi-locus sequence typing (MLST) protocol for Leuconostoc lactis isolates from traditional dairy products in China and Mongolia

Background Economically, Leuconostoc lactis is one of the most important species in the genus Leuconostoc. It plays an important role in the food industry including the production of dextrans and bacteriocins. Currently, traditional molecular typing approaches for characterisation of this species at the isolate level are either unavailable or are not sufficiently reliable for practical use. Multilocus sequence typing (MLST) is a robust and reliable method for characterising bacterial and fungal species at the molecular level. In this study, a novel MLST protocol was developed for 50 L. lactis isolates from Mongolia and China. Results Sequences from eight targeted genes (groEL, carB, recA, pheS, murC, pyrG, rpoB and uvrC) were obtained. Sequence analysis indicated 20 different sequence types (STs), with 13 of them being represented by a single isolate. Phylogenetic analysis based on the sequences of eight MLST loci indicated that the isolates belonged to two major groups, A (34 isolates) and B (16 isolates). Linkage disequilibrium analyses indicated that recombination occurred at a low frequency in L. lactis, indicating a clonal population structure. Split-decomposition analysis indicated that intraspecies recombination played a role in generating genotypic diversity amongst isolates. Conclusions Our results indicated that MLST is a valuable tool for typing L. lactis isolates that can be used for further monitoring of evolutionary changes and population genetics.


Background
Leuconostoc species are usually found in association with plant, dairy, meat or other food products. Like other lactic acid bacteria (LAB), Leuconostoc species are important industrial starter microbes that are used in several industrial and food fermentation processes, such as the production of cheese, butter, buttermilk, kefir, sourdough and kimchi [1,2]. These species are closely related to heterofermentative species in the genus Lactobacillus [3]. Phenotypically, the genus Leuconostoc and Lactobacillus are often isolated from the same habitats and share many characteristics [4].
The genus Leuconostoc was first described by Van Tieghem [5]. In recent years, several species have been reclassified within the genus; some new species have been added and new genera have been erected from species previously considered to belong to Leuconostoc. For example, the species L. mesenteroides was reclassified into three subspecies: L. mesenteroides subsp. mesenteroides, L. mesenteroides subsp. dextranicum and L. mesenteroides subsp. cremoris [6]. A new species, L. fallax was identified from sauerkraut [7] and subsequently a number of L. fallax isolates have been found in the heterofermentative stage of sauerkraut fermentation [7,8]. The L. paramesenteroides group of species have been reclassified into a new genus, Weisella [8]; L. oenos has been reclassified into the genus Oenococcus as O. oeni [9] and L. durionis, L. ficulneum, L. pseudoficulneum and L. fructosum have been assigned to a new genus, Fructobacillus [10]. Furthermore, L. argentinum has been reclassified as a synonym of L. lactis following numerical analysis of repetitive extragenic palindromic-PCR patterns, whole-cell protein profiles (SDS-PAGE) and fluorescent amplified fragment length polymorphism (FAFLP) band patterns [11]. New species, including L. holzapfelii, L. palmae and L. miyukkimchii, have also been identified from wine and kimchi [12][13][14].
Typing methods for intraspecies identification of pathogens are essential epidemiological tools in infection prevention and control [15] and have also been applied to LAB. Typing methods are divided into two major categories i.e., phenotypic and genotypic methods. Traditional phenotyping methods, such as the use of serotypes, biotypes, phage-types and antibiograms, have been used for many years to isolate and characterise LAB and, sometimes, to distinguish between species and subspecies. Compared with phenotypic typing methods, genotypic typing methods have some advantages as they have more general applicability and greater discriminatory power. Currently, several molecular typing approaches, such as random amplified polymorphic DNA (RAPD)-PCR, pulsed-field gel electrophoresis (PFGE), restriction fragment length polymorphism (RFLP), protein fingerprinting, and repetitive element palindromic PCR (Rep-PCR), have been used to characterise Leuconostoc species [16][17][18][19][20][21][22][23].
Multilocus sequence typing (MLST) is a technique for distinguishing accurately between different isolates within a species. MLST is based on the principles of phenotypic multi-locus enzyme electrophoresis (MLEE). MLEE is a typing method that relies on differences in electrophoretic mobility of different enzymes present within a bacterium [15]. Maiden et al., [24] first used the MLST method to identify virulent lineages of 107 isolates of Neisseria meningitides, a naturally transformable Gram-negative pathogenic bacterium [24]. Shortly thereafter, the method was used to analyse nonpathogenic food production bacteria including LAB. For example, Tanigawa and Watanabe [25] used MLST to compare seven housekeeping genes in 41 isolates of Lactobacillus delbrueckii and demonstrated that MLST was efficient for identification of isolates to subspecies level [25]. De Las Rivas et al. [26] compared the genetic diversity and genetic relationships amongst 18 O. oeni isolates using the gyrB, pgm, ddl, recP and mleA genes and MLST [26]. Bilhère et al. [27] found that MLST and pulsed-field gel electrophoresis (PFGE) were both useful for identifying 43 isolates of O. oeni, although the MLST method was more efficient [27]. Although the population biology of some LAB species has been characterised by MLST methods, to date, there is no MLST protocol available for Leuconostoc species.
The aim of the present study was to develop an effective MLST protocol for characterisation of L. lactis isolates and use this to explore the population structure and evolutionary relationships amongst isolates of this species.

Assignment of sequence types
Fifty L. lactis isolates were typed using the MLST protocol. Isolates could be divided into 20 sequence types (STs) using combined data from eight loci. ST14 was the most frequent (21 isolates), followed by ST11 (four isolates), ST3 (three isolates), ST4 (three isolates), ST1 (two isolates), ST8 (two isolates) and ST12 (two isolates); there was only one isolate in each of the remaining 13 STs.

MLST protocol and allelic variation
Eight genes were successfully sequenced and analysed by MLST for all isolates in this study. Polymorphic sites, guanine-cytosine content, rate of non-synonymous (d N ) and synonymous (d S ) substitutions and the d N /d S for each locus (groEL, carB, recA, pheS, murC, pyrG, rpoB and uvrC ) were determined ( Table 1). Fragment sizes of the eight selected loci ranged from 550 bp (recA) to 892 bp (groEL) ( Table 2). The number of polymorphic sites per locus ranged from 3 (recA) to 9 (murC) and a total of 47 SNPs were identified ( Table 1). The mean guaninecytosine content of the partial sequence of the eight gene fragments ranged from 43.12% (pyrG) to 48.31% (recA), while it was 37.7% in the whole L. mesenteroides subsp. mesenteroides ATCC 8293 genome previously described [28]. The value of the non-synonymous (d N ) and synonymous (d S ) substitutions ranged from 0.0000 (groEL) to 0.0077 (murC) and 0.0556 (groEL) to 0.2852 (carB) respectively. The lowest d N /d S ratio (<1) calculated for all eight loci is suggestive of weak purifying selection pressure.

Recombination in L. lactis
The level of linkage disequilibrium between all alleles of the isolates evaluated was high as the calculated I A S was 0.4264 (p = 0.000) and significantly different from the I A S value of 0 expected for a population with linkage equilibrium, indicating the genes investigated in this study were close to linkage disequilibrium.
Split decomposition analysis to examine evolutionary relationships amongst the isolates revealed different structures in the split graphs for all eight loci ( Figure 1A). In the split graphs for murC, pheS, pyrG and uvrC, the parallelogram-shaped structures detected indicated that intergenic recombination had occurred during the evolution of these four genes. The split graphs obtained for carB, groEL, recA and rpoB loci revealed tree-like structures, suggesting that the descent of these genes was clonal and not significantly affected by intergenic recombination. The split graphs of the recA and carB genes were a polygonal line and columnar respectively because only three (recA) or four (carB) alleles were analysed. The combined split graph of alleles for all eight MLST loci displayed a network-like structure ( Figure 1B). The 20 STs representing all isolates were divided into two main subpopulations and each subpopulation was completely disconnected. ST1 to ST10 were clustered together as one subpopulation, where some parallelogram-shaped groupings were detected. The split graphs for the remaining STs, clustered into a second subpopulation. This suggests that recombination had not occurred between isolates from the two subpopulations, but that intergenic recombination may occur between isolates from the same subpopulation during their evolution. ST19, which contained only isolate MAU80137 from non-traditional dairy production, was clearly disconnected from the others isolates, indicating no recombination had occurred between this isolate and other isolates from either of the two subpopulations.

Cluster analysis of the MLST data
Clustering by region amongst the isolates was evident in the minimum-spanning tree ( Figure 2). The 50 L. lactis isolates evaluated were assigned to 20 STs that resolved into eight clonal complexes (CCs). Among these CCs, 14 STs were clustered together to form two CCs and there were six singleton STs that could not be assigned to any group.
The largest CC was comprised of ST11, ST13, ST14, ST15, ST16, ST18 and ST20, which included 30 isolates, mainly from Sichuan province and Mongolia. Within this CC (colour-coded pink) ST14 was the predicted primary founder surrounded by single-locus (ST11, ST15, ST16, ST18, and ST20), or two-locus variants (ST13). These STs have been connected by solid black lines indicating they are closely related. The second CC included ST1 to ST6 and ST10, which included 16 isolates mainly from Sichuan and Gansu provinces. ST1 from Sichuan and Gansu province located in the centre of the second clonal complex. Single-locus variants were ST2, ST4 and ST5, which contained isolates from Gansu, Qinghai and Sichuan provinces. Two-locus variants were ST3, ST6 and ST10 and included isolates from Gansu province. ST7, ST8, ST9, ST12, ST17 and ST19 were singletons unlinked to the other CCs. However, they are connected to two primary founders, either ST1 or ST14, by grey or dotted lines, indicating they had a distant relationship with the two predicted ancestors. ST7 and ST8 were two and four-locus variants of ST1 and connected with grey lines. ST12 and ST17 were three-and four-locus variants of ST14 and also connected by grey lines. Isolate IMAU20185 belonging to ST9 was a six-locus variant of ST1 to which it was connected by a dotted line. Isolate IMAU80137 belonging to ST19 was a six-locus variant of ST14 to which it was also connected by a dotted line.

UPGMA tree based on MLST data
Genetic relatedness amongst the L. lactis isolates investigated in this study showed they were well clustered within two major groups, A and B. Group A was comprised of 34 isolates and group B of only 16 isolates. Group A was the better supported group and included two subgroups. Group B was a weakly supported group that included four subgroups (Figure 3). With the exception of ST19, isolates in group A were closely related only differing in two out of the eight loci from the primary founder, ST14. The isolate that belonged to ST19 was a six-locus variant of the primary founder. Isolates in Group B were distantly related and differed in between two and six of the eight loci from the primary founder ST1.

Discussion
MLST is considered to be the best method for studying molecular epidemiology and population structure of bacteria [29][30][31]. Although this approach has been developed for several LAB, such as Lb. plantarum, Lb. delbrueckii, Lb. casei, and O. oeni [25,26,32], until this study there had been no MLST protocol used for L. lactis. In this study, we used MLST with eight housekeeping genes on 50 L. lactis isolates from a relatively large geographic area including Mongolia, a number of Chinese Provinces and an Autonomous region. These representative isolates are unique in their diversity of sources and provide the relevant information required for a better understanding of genetic diversity, persistence and movement. The first step in development of a MLST typing method required analysis of the sequence diversity of eight housekeeping genes from the 50 L. lactis isolates under evaluation, to ensure that the MLST protocol had the discriminatory power to type isolates within a single species. The two loci that had low polymorphism, contained three and four polymorphic sites in the recA and carB loci respectively ( Table 1). The low level of biodiversity in recA and carB suggested they had similar sequences at the species level and would, therefore, have a lower discriminatory ability than the other housekeeping loci used in this study. The remaining six loci, groEL, pheS,  uvrC, rpoB, pyrG, murC had more polymorphic sites (between five and nine), suggesting that they would have a good discriminatory ability when used in MLST. A total of 47 polymorphic sites were detected in the eight loci giving a polymorphism rate of 0.88% of the 5,325 nucleotides present. The relatively low number of polymorphic sites suggests that the partial sequences of the eight housekeeping genes were somewhat conserved amongst the 50 L. lactis isolates. Although housekeeping genes evolve slowly, however, we still consider that the eight housekeeping genes selected provide sufficient discriminatory power for typing. In our genetic analysis, the eight housekeeping loci had a very low d N /d S ratio (<1), implying strong purifying selection. This was particularly the case in groEL where five substitutions were synonymous and the amino acid composition did not change. The d N /d S ratio was close to zero and similar to that seen by Madslien et al. [33], which supports our estimation that the housekeeping loci are all under stabilizing selection [33].
Leuconostoc lactis isolates are important industrially in the production of fermented foods. However, their population structure has not been investigated fully before. We used linkage disequilibrium expressed as the index of association (I A ) by the equation, observed variance; V E , expected variance) to investigate population structure. This index of association is a generalised measure of linkage disequilibrium; does not rely on the number of loci analysed; has an expected value of zero if there is no association between loci, i.e. alleles are in linkage equilibrium (free recombination) [34,35]; and was pioneered to describe population structure in Hordeum spontaneum [36]. In our study, the value of I A and I A S for eight loci were calculated as 1.8 and 0.4264 (p = 0.000), respectively. These high values are indicative of a strong clonal population and similar to reports for other bacteria. For example, in Lb. plantarum, where seven selected housekeeping genes were analysed and assigned to 17 different STs, the I A S value was 0.444 [37]. In Bacillus licheniformis, where six housekeeping loci were analysed from 53 diverse isolates, the value of I A S was 0.4365 [33]. These results are similar to our study on L. lactis and, therefore, support our hypothesis that these are clonal populations and that allelic selection is close to linkage disequilibrium.
In general, Leuconostoc species are used as starter cultures for dairy fermentations. All isolates initiate lactose fermentation and lactic acid production and here we have shown that some essential housekeeping genes are highly conserved. However, the value of I A S and the number of unique STs reflect the genetic diversity amongst isolates that have each adapted to specialised environments during their evolution. Similar results have been reported for other LAB isolated from dairy products; for example 197 isolates of Lactococcus lactis isolated from homemade yogurt were assigned to 72 different STs and their I A S value was 0.3038 [38]. Uniformly, a clonal structure was also found in Streptococcus thermophilus, where eight housekeeping loci were analysed from 26 isolates from different dairy products [39].
Split-decomposition analysis based on the allelic profiles of isolates evaluated provided evidence of intraspecies recombination that could play a role in generating genotypic diversity amongst isolates. Parallelogram-shaped structures were commonly found in the split graphs of the partial housekeeping genes (murC, pheS, pyrG, and uvrC) and the combined alleles, illustrating recombination had occurred in some MLST loci. Previous studies have provided evidence that recombination could occur frequently in Leuconostoc species because mobile elements, such as bacteriophages, genomic islands and transposable elements, were found in the genome sequence [40,41]. In addition, some plasmids from Leuconostoc species have been identified [42,43]. In O. oeni isolates, a similar recombination phenomenon has been found including the presence of plasmids, bacteriophages and insertion sequences [44][45][46]. Furthermore, the presence of parallelogram-shaped structures were also found in the ddl, pgm and recP split graphs of O. oeni isolates [26].
Although this study on the population structure of L. lactis has made important steps forward, e.g. the splitdecomposition analysis based on concatenated sequences of housekeeping genes (Figure 1), the UPGMA tree based on the MLST data ( Figure 3) and the I A S values, we could still not confirm any association between ST and the original source of each isolate. Similar results have been reported in Lactococcus lactis and Lactobacillus sanfranciscensis, where no significant associations between STs and the various sources of the isolates could be found [47,48]. The absence of such an association in L. lactis may be because of the genetic diversity of individual L. lactis isolates. At the gene level, MLST analysis indicated two CCs and six singletons. The majority of L. lactis isoaltes from dairy products were found in these two CCs; the remaining isolates from various sources including yogurt, kurut, yak's milk and pickle, were scattered into unique STs. This characterisation was also reflected in the UPGMA dendrogram, with isolates clustering as two groups that could be further divided into several subgroups (Figure 3). These unique STs (ST7, ST8,ST9, ST12, ST17 and ST19) illustrate the genetic diversity within the subspecies.

Conclusions
A MLST protocol for L. lactis isolates, based on eight housekeeping genes and 50 L. lactis isolates was developed. In this study, we demonstrated biodiversity, clonal population structure and genetic recombination in the isolates evaluated. All of these isolates could be separated into two distinct groups that had evolved independently from each other, except isolate MAU80137 from ST19. This isolate was the only one from a nontraditional dairy and was only distantly related to all the other isolates analysed. Future work will target other sources of L. lactis by examining environmental samples to obtain a better understanding of the evolution and population genetics of L. lactis.

Leuconostoc lactis isolates
In this study, a total of 50 L. lactis isolates, preserved in the Lactic Acid Bacteria Collection Center of the Inner Mongolia Agricultural University (LABCC), were examined and characterised (Additional file 1: Table S1). These isolates originated from various sources including yogurt, kurut, qula and other traditional foods from Mongolia, the P.R. of China Provinces Sichuan, Qinghai, Gansu and the P.R. China Inner Mongolia Autonomous Region. Leuconostoc lactis isolate MAU80137 was the only isolate from pickle (Sichuan province). All isolates were identified as L. lactis based on standard physiological and biochemical tests, and sequence analysis of the 16S rRNA gene [32,49]. Stock cultures were stored in 10% glycerol at −80°C. Working cultures were retrieved from storage and activated by two subcultures through de Man Rogosa Sharpe (MRS) broth (Becton, Dickinson Co., Sparks, Md., USA). Isolates were incubated at 30°C for 24 h under anaerobic conditions prior to evaluation.

DNA extraction
Genomic DNA was extracted from all isolates as described previously [50]. Briefly, after overnight incubation in MRS broth at 37°C, the bacterial cells were collected by centrifugation (8,000 × g, 3 min, 4°C) and subjected to freezethaw cycles for cell lysis. Next, 10% sodium dodecyl sulphate (SDS) and proteinase-K solution (20 mg/ml) were added, mixed well, and incubated in a shaking incubator at 200 rpm and 37°C overnight. This was following by addition of 0.7 M NaCl and 10% cetyltrimethyl ammonium bromide (CTAB) and further incubation at 65°C for 20 minutes. Protein contaminants were removed by the addition of phenol/chloroform/isoamyl alcohol (25/24/1). The DNA was precipitated as a pellet by the addition of an equal volume of ice-cold isopropanol, and then washed in 70% (v/v) ice-cold ethanol and dissolved in sterile ultrapure water. The purity of the extracted DNA was quantified by recording its optical density at 260 and 280 nm, respectively, using a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA).

Amplification and nucleotide sequencing
Primers for the polymerase chain reaction (PCR) were designed based on the internal fragments of the eight loci using Primers 5 software (Table 2). In this study, the MLST protocol was modified in two ways; firstly, the primers targeting internal fragments of each gene were extended from 450-500 to 500-700 bp and secondly, although MLST protocols generally only use five to seven genes, in this study, eight housekeeping genes were used to analyse the population structure of L. lactis. The eight housekeeping gene fragments (carB, groEL, murC, pheS, pyrG, recA, rpoB, uvrC) were amplified from chromosomal DNA from each isolate using amplification and sequencing primers ( Table 2). The PCR procedure for the pyrG, carB, murC and pheS genes was done under the following conditions: 94°C for 5 sec, 30 cycles of amplification which included 95°C for 60 sec, 50°C for 45 sec, 72°C for 60 sec and then annealing at 72°C for 10 min. PCR for the remaining genes followed the same experimental conditions except for the annealing temperature which was 54°C. PCR reactions were made in a 10 μl reaction mixture containing 0.08 μl Taq polymerase (5 U/μl, Takara, Tokyo), 1 μl 10 × PCR Buffer (Mg 2+ free), 0.8 μl dNTPs (2.5 mM each), 0.8 μl MgCl 2 (25 mM), 0.4 μl forward primer (10 μM), 0.4 μl reverse primer (10 μM), 1 μl genomic DNA (10-50 ng/μL), and 5.52 μl dH 2 O. The PCR products were separated by electrophoresis on a 1.2% agarose gel and then visualised using ethidium bromide staining. Sequencing of the PCR products was done by the Shanghai Sangni Biosciences Corporation (Shanghai, China) and the sequences deposited in the GenBank/EMBL databases under accession numbers KJ149820 to KJ150219.

Data analysis
The sequences obtained for the eight housekeeping genes in the MLST protocol from all isolates were imported into BioNumerics software (version 6.0, Applied-Maths, Sint Maartens-Latem, Belgium) and the number of unique alleles per locus obtained. In date analysis, all unique sequences were assigned an allele number and each unique combination of eight allele numbers per isolate was assigned a ST [27]. The guanine-cytosine content, d N /d S ratio (d S is the number of synonymous substitutions per synonymous site and d N is the number of non-synonymous substitutions per non-synonymous site) and the number of polymorphic sites and single nucleotide polymorphisms (SNPs) of the eight housekeeping genes for each isolate were calculated using LIAN-Linkage analysis [51]. The level of linkage disequilibrium between all alleles of the isolates was investigated by determining the standardised index of association (I A S ) [34]. Phylogenetic trees were constructed by the neighbour-joining (N-J) method in MEGA version 5.0 software (version 5.0, www.megasoftware.net). The relationships between MLST STs and analysis of CCs were revealed using eBURST (Based Upon Related Sequence Types) V 3.0 software (http://eburst.mlst.net). CCs are typically composed of a single predominant genotype with a number of much less common close relatives of that genotype [52]; the isolates of L. lactis that shared a minimum of five identical alleles with the central genotype in a given subpopulation were assigned to a different CC. An UPGMA dendrogram was constructed by START 2.0 software using the unweighted pair-group method and the arithmetic average method (UPGMA). The split decomposition was done with SplitsTree and START 2.0 software on the MLST website (http://eburst.mlst.net/). Minimum-spanning tree analysis of the STs from all isolates was done using Prims's algorithm in the BioNumerics software according to region and source separation (version 6.0, Applied-Maths, Sint Maartens-Latem, Belgium).