CRISPR-MVLST subtyping of Salmonella enterica subsp. enterica serovars Typhimurium and Heidelberg and application in identifying outbreak isolates

Background Salmonella enterica subsp. enterica serovars Typhimurium (S. Typhimurium) and Heidelberg (S. Heidelberg) are major causes of foodborne salmonellosis, accounting for a fifth of all annual salmonellosis cases in the United States. Rapid, efficient and accurate methods for identification are required for routine surveillance and to track specific strains during outbreaks. We used Pulsed-field Gel Electrophoresis (PFGE) and a recently developed molecular subtyping approach termed CRISPR-MVLST that exploits the hypervariable nature of virulence genes and Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) to subtype clinical S. Typhimurium and S. Heidelberg isolates. Results We analyzed a broad set of 175 S. Heidelberg and S. Typhimurium isolates collected over a five-year period. We identified 21 Heidelberg Sequence Types (HSTs) and 37 Typhimurium STs (TSTs) that were represented by 27 and 45 PFGE pulsotypes, respectively, and determined the discriminatory power of each method. Conclusions For S. Heidelberg, our data shows that combined typing by both CRISPR-MVLST and PFGE provided a discriminatory power of 0.9213. Importantly, CRISPR-MVLST was able to separate common PFGE patterns such as JF6X01.0022 into distinct STs, thus providing significantly greater discriminatory power. Conversely, we show that subtyping by either CRISPR-MVLST or PFGE independently provides a sufficient discriminatory power (0.9345 and 0.9456, respectively) for S. Typhimurium. Additionally, using isolates from two S. Typhimurium outbreaks, we demonstrate that CRISPR-MVLST provides excellent epidemiologic concordance.


Background
Non-typhoidal Salmonella are one of the leading causes of bacterial foodborne disease in the United States, accounting for over a million human cases each year [1]. Salmonellosis symptoms include diarrhea, fever and abdominal cramps that occur 12 to 72 hours after infection. Annually, Salmonella is responsible for an estimated 20,000 hospitalizations and nearly 400 deaths in the United States, with a financial burden of approximately $3.3 -4.4 billion [2,3]. Most infections are transmitted via ingestion of contaminated food and, unlike trends with other bacterial foodborne pathogens, the annual incidence rate of salmonellosis has not significantly declined over the past decade. Since 2006, nearly a fifth of all salmonellosis cases in the United States were caused by Salmonella enterica subsp. enterica serovars Typhimurium (S. Typhimurium) and Heidelberg (S. Heidelberg) [4]. According to the Centers for Disease Control and Prevention, there have already been two outbreaks in 2013 where S. Typhimurium and S. Heidelberg were responsible [5,6].
To limit and reduce the scope of a Salmonella outbreak, an efficient and robust surveillance system is vital. During epidemiological investigations Salmonella isolates are serotyped and concurrently subtyped to classify isolates to the strain level. An ideal subtyping method has a high discriminatory power (i.e. can separate all unrelated strains) but is not so discriminatory that it inadvertently separates isolates that are part of the same outbreak (i.e. possesses high epidemiologic concordance). There are several molecular-based subtyping approaches that have been developed, including pulsed-field gel electrophoresis (PFGE) [7], amplified fragment length polymorphism (AFLP) [8][9][10], multiple-locus variable-number tandemrepeat analysis (MLVA) [11][12][13][14][15][16][17], multiple amplification of prophage locus typing (MAPLT) [13,18] and, most recently, a multiplex DNA suspension array [19]. PFGE was adapted to Salmonella in the 1990s and generally provides a high discriminatory power for subtyping most Salmonella serovars, though it certainly does not provide equal sensitivity across all serovars [20]. Despite being labor-intensive and time-consuming, conventional serotyping and concurrent PFGE fingerprinting is still considered the gold standard for Salmonella subtyping and is widely used by public health surveillance laboratories [21][22][23]. Although PFGE data are uploaded to PulseNet USA (http://www.cdc.gov/pulsenet), the national electronic network for food disease surveillance that is coordinated by the CDC, inter-laboratory comparisons of PFGE fingerprints can be ambiguous.
There are several different PFGE patterns, or pulsotypes, though most often a limited number of common patterns are associated with the majority of isolates within a given serovar. Two recent S. Typhimurium and S. Heidelberg foodborne outbreaks in the United States involved contaminated cantaloupe melons (S. Typhimurium, 2012; 228 reported illnesses) [24] and broiled chicken livers (S. Heidelberg, 2011; 190 reported illnesses) [25]. In both cases, the individual XbaI PFGE patterns associated with each strain were fairly common: for S. Typhimurium, the associated PFGE pattern is typically seen in 10-15 cases per month [24] and for S. Heidelberg, the pattern occurs even more frequently, 30-40 cases per month [25]. Consequently, identification of the outbreak strains was particularly difficult and to more accurately identify isolates that were part of the S. Typhimurium cantaloupe outbreak, these isolates were also analyzed by MVLA to define the outbreak strain. Additionally, another S. Heidelberg outbreak in 2011, linked to ground turkey, involved isolates with two similar but distinctly different PFGE patterns, thus showing reduced epidemiologic concordance by this subtyping method [26]. This last example may indicate evolutionary relatedness between the two sets of isolates which, unlike some methods, PFGE cannot really provide.
The recent outbreak cases described above highlight the need for additional subtyping approaches for Salmonella that can be used instead of, or as a complement to PFGE for routine disease surveillance and outbreak tracking. Clustered Regularly Interspaced Short Palindromic Repeats (CRISPRs) are found in~50% of all bacterial species, including Salmonella [27]. CRISPR elements comprise several unique short sequences, called spacers, which are interspaced by conserved direct repeats. In some bacteria, homology between a spacer and a complementary target nucleic acid results in degradation of the target by sequence-specific endonucleases, providing protection from exogenous bacteriophage or plasmid DNA [reviewed in 28]. Due to both acquisition and loss of these spacer elements, CRISPRs represent arguably the most rapidly evolving prokaryotic loci [29][30][31].
We recently developed a sequence-based subtyping assay (multi-virulence locus sequence typing; MVLST) for Salmonella that involves the sequencing of two CRISPR1 CRISPR2 Figure 1 Salmonella CRISPR loci. Salmonella have two CRISPR loci, CRISPR1 and CRISPR2 comprised of direct repeats of 29 nucleotides (black diamonds) separated by spacers (empty rectangles). There is an A-T rich leader sequence upstream of each locus (shaded rectangle) and the CRISPR-associated genes (cas) are upstream of the CRISPR1 locus (grey boxed arrow). Primers used for amplification are shown in blue and red for CRISPR1 and CRISPR2, respectively. virulence genes, fimH1 (fimH) and sseL, in addition to CRISPR sequencing [33]. Preliminary studies showed that this approach, termed CRISPR-MVLST, provided better discrimination than either CRISPR or MVLST alone and, importantly, exhibited strong epidemiologic concordance among eight out of nine of the most common illness-causing Salmonella enterica serovars [33], including both S. Heidelberg and S. Typhimurium outbreak strains. Subsequently, among a large number of clinical isolates of the highly clonal S. Enteritidis, a combination of CRISPR-MVLST and PFGE was required to provide a sufficient discriminatory power [34]. Among a large set of S. Newport clinical isolates, CRISPR-MVLST provides similar discrimination to PFGE [41].
To further determine the functionality of this new subtyping approach, we investigated the discriminatory power of both CRISPR-MVLST and PFGE among a larger and unbiased collection of clinical S. Typhimurium and S. Heidelberg isolates that were collected over a five year period. We show here that a combination of both CRISPR-MVLST and PFGE is required to achieve an appropriate discriminatory power for S. Heidelberg. For S. Typhimurium, both subtyping methods independently provide a discriminatory power >0.94. Importantly, as one of the first applications of CRISPR-MVLST to analyze isolates that were part of an outbreak, we were able to cluster two different S. Typhimurium outbreak strains.

Results of CRISPR-MVLST
To more accurately determine the discriminatory power of CRISPR-MVLST and PFGE for S. Heidelberg and S. Typhimurium, we subtyped 89 and 86 isolates, respectively, that were obtained from the Pennsylvania Department of Health (Table 1). Among the 175 total isolates analyzed, we identified 29 CRISPR1 alleles, 31 CRISPR2 alleles, 6 fimH alleles and 7 sseL alleles ( Table 2). Of these, we found 27, 30, 2 and 4 alleles, respectively, that were novel and not seen in our previous data sets [33]. In total, these alleles defined 58 novel sequence types among the two serovars (Tables 3 and 4). The overwhelming sequence-type diversity among both of these prevalent serovars is provided by genetic variability in the CRISPR loci, rather than in either fimH or sseL ( Figure 2). We found that 88/89 S. Heidelberg isolates had fimH allele 7 and in S. Typhimurium there were two predominant fimH alleles, allele 6 (52/86 isolates) and allele 8 (28/86 isolates). Similarly, in S. Heidelberg, 88/89 isolates bore sseL allele 19 and in S. Typhimurium, 73/86 isolates had sseL allele 15. The polymorphisms between different sseL or fimH alleles arise from the presence of SNPs with the exception of allele 63 that has a single base insertion. No alleles for any of the four markers were shared among the two different serovars, consistent with previously published studies [32][33][34].

S. Heidelberg analysis and sequence type distribution
CRISPR-MVLST analysis of 89 S. Heidelberg clinical isolates (representing 27 unique PFGE patterns) resulted in 21 unique S. Heidelberg Sequence Types (HSTs), HST 7 -HST 27 (Table 3). In total, we identified 12 CRISPR1 alleles, 8 CRISPR2 alleles, 2 fimH alleles and 2 sseL alleles ( Table 2). As shown in Figure 2b, most of the allelic diversity comes from the CRISPR1 and CRISPR2 loci. All 12 CRISPR1 alleles and seven of the eight CRISPR2 alleles were new, compared to our previous studies [33]. We did not find any new fimH alleles in our dataset and only one of the two sseL alleles was new. The most frequent ST was HST7, occurring in 49/89 isolates (54%).

Discriminatory power of CRISPR-MVLST and PFGE in S. Heidelberg isolates
The discriminatory power of CRISPR-MVLST among the S. Heidelberg isolates was calculated to be 0.6931 (Figure 3a). The discriminatory power provided by PFGE among the same isolates was 0.8149 ( Figure 3b). Given these low values and insufficient discriminatory power (an ideal discriminatory power is >0.95) [42], we combined the two typing methods. This combination provided 44 unique groups with a more satisfactory discriminatory power of 0.9213 (Figure 3c), suggesting a 92% confidence in ability to separate unrelated isolates.

S. Typhimurium analysis and sequence type distribution
CRISPR-MVLST analysis of 86 S. Typhimurium clinical isolates (representing 45 unique PFGE patterns) resulted in the identification of 37 unique and novel S. Typhimurium Sequence Types (TSTs), TST9 -TST41, and TST56 -TST58 (Table 4). This included 17 CRISPR1, 23 CRISPR2,    4 fimH and 5 sseL alleles (Table 2). Of these, the majority of CRISPR1 alleles were new (15/17 alleles) and all CRISPR2 alleles were new (23/23), as compared to our previous studies [33]. As with S. Heidelberg, the majority of unique sequence types were defined by polymorphisms in either or both of the CRISPR loci ( Figure 2c).

Typhimurium isolates
The discriminatory power of CRISPR-MVLST among the S. Typhimurium isolates was 0.9415 ( Figure 4a). This means that there would be a 94% probability that two unrelated isolates could be separated using the CRISPR-MVLST scheme. Similarly, for PFGE, the discriminatory power among these isolates is 0.9486 (Figure 4b). These values suggest that either method can provide sufficient discrimination between outbreak and non-outbreak S. Typhimurium strains.

Correlation between different TSTs and PFGE patterns
We next wanted to investigate whether any correlation existed between TSTs and PFGE patterns. To accomplish this, we first determined the relationship among different TSTs. BURST analysis of all 37 TSTs generated four groups (Figure 5a). Of these, Groups 1-3 contain 6 -15 TSTs. Group 4 consists of only two TSTs and BURST was unable to assign a core TST. There was also a collection of five singletons that BURST did not assign to a group. For Groups 1-3, each group comprises a core TST surrounded by TSTs that differ from the core by one allele. The number of rings in the group demonstrates the number of allele differences from the core. For example, in Group 1 TSTs 9, 37, 32, 20, and 14 each differ by one allele at one locus from the core TST, TST 13. For group 3, TST 10 is the core TST and TSTs 15, 31, 36, 29, 23 and 16 each differ from TST 10 at one locus. TST 34, in the outer ring differs from the TSTs in the middle ring at one locus and from the core at two loci.
To investigate whether there was any relationship between CRISPR-MVLST sequence type and PFGE patterns, we overlaid our PFGE data to identify isolates from different TSTs that have the same PFGE pattern. Figure 5a shows that there were seven PFGE pulsotypes that could be further separated into TSTs. In the majority of instances (5/7), identical PFGE patterns were found in isolates that had closely related TSTs such as JPXX01.0003 and JPXX01.0604 (TSTs 15, 31, 10 and TSTs 12 and 21, respectively). The data are shown in order of Sequence Type (HST or TST) and further sorted by PFGE pattern.  Following this, we then generated a dendrogram using the Dice coefficient to determine the relationship between different PFGE pulsotypes. For clarity, we colorcoded the PFGE patterns according to the BURST Group shown in Figure 5a. As can be seen in Figure 5b, closely related CRISPR-MVLST sequence types have similar PFGE patterns.

CRISPR-MVLST analysis of S. Typhimurium outbreak isolates
Since CRISPR-MVLST and PFGE exhibit a similarly high discriminatory ability in S. Typhimurium, we wanted to investigate the utility of the former for separating outbreak isolates. We obtained 30 S. Typhimurium isolates from the Pennsylvania Department of Health ( Table 5). Ten of these were isolates associated with an outbreak in 2004 with the cluster designation 0411PAJPX-1c. All affected persons were on a bus trip together, though the outbreak source was never identified. The remaining 20 isolates comprised 10 isolates that were linked to a 2009 live poultry outbreak (cluster 0905PAJPX-1) and 10 control isolates that were isolated in the same year but were not part of any classified outbreaks.
CRISPR-MVLST was able to separate the 2004 isolates, with each isolate bearing the unique TST59 (Tables 4 and  5). These isolates were also analyzed by two-enzyme PFGE, using XbaI and BlnI. Though they had the same TST, two of the isolates, 04E02241 and 04E02239 had different PFGE patterns with BlnI or XbaI, respectively, and are indicated in bold in Table 5. This example shows that CRISPR-MVLST provides an epidemiologic concordance of 1 (E = 1.0) and for PFGE it is less than 1 (E < 1.0). Additionally, the XbaI PFGE pattern associated with this strain, JPXX01.0146, occurred fairly frequently in our initial data set; 12/86 isolates had this pulsotype and we were able to separate these into seven different TSTs.
For the 2009 outbreak isolates, CRISPR-MVLST correctly identified the 10 outbreak isolates (TST42) and these all have the same PFGE pattern, JPXX01.0302, thus  The numbers represent the allelic identifier for the individual CRISPR-MVLST markers. The combination of four specific alleles defines a given HST. The frequency is the number of times a particular TST was observed among the 86 S. Typhimurium isolates analyzed in the first study and does not include the frequency of TSTs that were seen in the outbreak study. All TSTs identified here were new and not seen in previous studies. a Some CRISPR2 alleles required more than two sequencing primers to cover the whole length of the array. Alleles that required three primers are noted with * and the two isolates that required seven primers to sequence CRISPR2 are noted with **. The position of these primers is shown in Additional file 1. for both subtyping methods E = 1.0. Two of the sporadic case control isolates were also TST42 (shown in bold in Table 5) but these had different PFGE pulsotypes from the outbreak strain, suggesting a lack of discrimination by CRISPR-MVLST in this instance. TST42 was seen in two isolates in the initial study of 86 S. Typhimurium isolates. All isolates within each outbreak were identified using CRISPR-MVLST, thus obtaining perfect epidemiological concordance with this subtyping method.

Discussion
Foodborne illness caused by Salmonella enterica species, particularly by S. Typhimurium and S. Heidelberg, accounts for 18.5% of salmonellosis annually in the United States [4]. For accurate outbreak tracking and routine disease surveillance, it is critical that we employ rapid, efficient and robust subtyping methodologies. PFGE is the current gold standard for molecular subtyping of Salmonella and other methods include AFLP, MVLA and CRISPR-MVLST.
CRISPR sequence analysis is one of the cheaper and faster methods for Salmonella subtyping [22]. For the majority of isolates analyzed, CRISPR-MVLST could be completed in less than 24 hours, including DNA isolation and analysis. Additionally, by virtue of their nature, sequencing data are more robust and tractable; this type of data is unequivocal and, with regards to interlaboratory or database use, is highly consistent. They also provide increased downstream utilities that involve analysis of sequence information, such as phylogenetic studies. This approach is also in line with other highthroughput subtyping approaches, including real-time CRISPR analysis [32] and whole genome sequence analysis [43][44][45][46][47]. Conversely, although protocols exist that allow PFGE to be completed in 24 hours, it can often take 1-3 days, requires skilled personnel, inter-laboratory data analysis can be challenging and the data have no utility beyond subtyping. Given the advancement of whole-genome sequencing technologies, typing methods based on these are in development [48]. While highly discriminatory, limitations to this approach that are not issues with either CRISPR-MVLST or PFGE include the time required for analysis and space required for data storage.
CRISPR spacer analysis alone has been used to analyze several different Salmonella serovars [32]. Fabre and colleagues showed that among 50 isolates of S. Typhimurium and its I,4, [5],12:i-variant, combined CRISPR1 and CRISPR2 sequence information is comparable to PFGE (D = 0.88 and 0.87, respectively). Both methods were more discriminatory than phage typing analysis of the same set of isolates. The same study also analyzed spacer content of S. Typhimurium and S. Enteritidis from 10 outbreaks and in all cases CRISPR sequences exhibited high epidemiologic concordance.
A preliminary investigation showed that addition of CRISPR spacer analysis to an MVLST scheme improves discrimination, beyond that provided by either approach independently, in eight out of nine of the most common illness-causing Salmonella serovars [33]. We wanted to extend our evaluation of CRISPR-MVLST utility among predominant and clinically relevant Salmonella serovars. To date we have tested and compared CRISPR-MVLST to PFGE on large numbers of S. Enteritidis [34], S. Newport [41] S, Heidelberg and S. Typhimurium isolates. Among the total 175 isolates analyzed here, we found significantly fewer alleles of fimH and sseL, compared to alleles of either CRISPR locus (Table 2; Figure 2). Given the reduced contribution of the virulence genes to defining STs, their addition may seem superfluous within this subtyping scheme. However, in this data set, fimH alleles define two STs, HST13 and TST20 and sseL alleles define five STs, TST16, TST19, TST23, TST29 and TST36. This further supports earlier findings showing that addition of MVLST to a CRISPR-based subtyping scheme increases discrimination in S. Enteritidis [34] as well as among a broad set of Salmonella enterica serovars [33]. Though the number of isolates for each serovar was similar, the number of STs within each serovar is surprisingly disparate: among 89 S. Heidelberg isolates we identified 21 HSTs and in 86 S. Typhimurium isolates, we identified 37 TSTs. This presumably reflects varied levels of clonality in different serovars. Independently of the number of STs defined for either serovar, the CRISPR loci are responsible for the vast majority of alleles: (S. Heidelberg -83.3% and S. Typhimurium -80%) (Figure 2). In S. Heidelberg, 50% of the different alleles identified were CRISPR1 alleles. Given that CRISPRs are of one of the more dynamic loci in bacteria [30,31], this finding is not unexpected. Although PFGE was more discriminatory than CRISPR-MVLST among 89 S. Heidelberg isolates (D = 0.81 versus 0.69, respectively), a combination of both techniques provided an improved value of 0.92. This represents a 92% probability that two unrelated strains can be separated. JF6X01.0022 is the most common PFGE pattern in Pulse-Net for S. Heidelberg [49] and is seen 30-40 times a month by the CDC. In our data set, 42% of the isolates have the JF6X01.0022 pattern and using CRISPR-MVLST, we were able to further separate these into seven distinct CRISPR-MVLST types (Figure 3b and d). Given the frequency at which this PFGE pattern occurs nationally, not all isolates that have this pattern may be associated with a specific outbreak, further enhancing the utility of CRISPR-MVLST as a complement to PFGE analysis. Collectively, these findings in S. Heidelberg show that the JF6X01.0022 pattern is analogous to the JEGX01.0004 pattern in S. Enteritidis, where the latter was observed in 51% of isolates analyzed and was separated into 12 distinct STs [34]. A proposed improvement for discrimination in The 10 isolates without cluster information represent the sporadic, or non-outbreak related, isolates used as controls in the study.
S. Heidelberg and S. Enteritidis by PFGE is to increase the number of enzymes used for PFGE analysis [50,51], though the concurrent use of PFGE and CRISPR-MVLST would be much more efficient than this approach. Regarding S. Heidelberg, our data are similar to that observed in a broad set of S. Enteritidis isolates [34]: both serovars exhibit fewer number of STs identified and both require combining CRISPR-MVLST and PFGE to obtain a sufficient discriminatory power. This presumably reflects similar levels of clonality in S. Heidelberg and S. Enteritidis as compared to more heterogenous serovars such as S. Typhimurium where we observed many more STs present within a similar number of isolates examined.
Our data show that in S. Typhimurium, the discrimination provided by either PFGE or CRISPR-MVLST is similar (0.9486 versus 0.9415, respectively). When CRISPR-MVLST was applied to outbreak isolates, we were able to correctly identify the 20 isolates representing the two outbreaks, showing an extremely good epidemiologic concordance with this typing method. The epidemiologic concordance was better by CRISPR-MVLST than PFGE in identifying isolates from the 2004 bus trip outbreak and both methods had equal epidemiological concordance for the 2009 live poultry outbreak. Regarding the 2004 outbreak, the majority of isolates had the JPXX01.0146 pulsotype. In our initial study, this pulsotype was seen frequently, 16% of all isolates analyzed, and the 14 isolates with this pattern could also be represented by 7 distinct TSTs. Conversely, all isolates from this outbreak have TST59, which is unique and not seen in our initial data set showing that in this instance, CRISPR-MVLST may be a better subtyping approach. In analyzing the 2009 live poultry outbreak, it appears that PFGE is more discriminatory than CRISPR-MVLST, as CRISPR-MVLST also identified two nonoutbreak related isolates as TST42. Given the available epidemiological data available, these two isolates do not appear to be associated with the outbreak. The fact that CRISPR-MVLST works better in some instances than others is not surprising and can also occur when other subtyping methods are used. 'Problematic' PFGE pulsotypes also exist and is one reason that second generation methods like MLVA and CRISPR-MVLST are being developed [33,52]. As a recent example, isolates associated with the 2012 S. Typhimurium cantaloupe outbreak, had a common PFGE pattern so additional subtyping by MLVA was performed to correctly define the outbreak strain [24]. That there is a strong association among closely related sequence types and closely related PFGE patterns for both S. Typhimurium ( Figure 5) and S. Newport [41] provides further evidence that CRISPR-MVLST could serve as an appropriate alternative subtyping method.
Beyond the data shown here and in further evaluating the value of CRISPR-MVLST sequence typing, a recent study investigating S. Typhimurium isolates from a variety of animal sources showed an association of CRISPR-MVLST sequence types and resistance to antibiotics [40]. As part of that study, the most frequent TSTs were TST10 and TST42, both of which were found in this current study. TST10 was also the most frequent clinical sequence type seen in this study (16/86 isolates) but only two isolates were TST42.

Conclusion
CRISPR-MVLST is a relatively new subtyping approach with limited studies conducted in Salmonella that demonstrate its utility [33,34,39]. Our data here add to this body of work by demonstrating its functionality in two highly prevalent clinical serovars. Investigation of several more outbreak strains using CRISPR-MVLST will elucidate the true capability of this subtyping method. Our data here show that CRISPR-MVLST can be used in concert with PFGE, as in the case of S. Heidelberg, or potentially as an independent subtyping method, as in the case of S. Typhimurium.

Bacterial isolates and sample preparation
A summary of all isolates analyzed in this study is listed in Table 5. A total of 89 and 86 clinical isolates of S. Heidelberg and S. Typhimurium, respectively, were obtained from the Pennsylvania Department of Health. These isolates were selected systematically (isolates received closest to the 1 st and 15 th of each month from 2005 -2011 were selected) to represent an unbiased collection of human clinical isolates. PFGE-XbaI analysis of these isolates was conducted using standard protocols [7,53]. All isolates were stored at −80°C in 20% glycerol. Isolates were grown overnight in 2 mL LB at 37°C in a shaking incubator. DNA was isolated using the Promega genomic DNA isolation kit, following the manufacturer's directions (Promega, Madison, WI). DNA samples were stored at −20°C prior to PCR analysis.

PCR amplification
Primers for amplification of all four genomic loci are listed in Table 6. PCR reactions were performed in a total volume of 25 μl: 1.5 μl template, 0.3 μl Taq (1.5 units; New England Bio Labs, Ipswich, MA), 0.2 μl 10 mM dNTPs, 1 μl of each 10 μM primer, 2.5 μl of 10× Taq buffer and 18.5 μl water. PCR conditions were as follows and the annealing temperatures (AT) are listed in Table 6: initial denaturation step of 10 minutes at 94°C followed by 35 cycles of 1 minute at 94°C, 1 minute at AT and extension for 1 minute (fimH and sseL) or 1.5 minutes (CRISPR1 and CRISPR2) at 72°C; a final extension step was done at 72°C for 8 minutes. 5 μl of each PCR product was electrophoretically analyzed on a 1.2% agarose gel and the remaining reaction stored at −20°C.

DNA sequencing
PCR products were treated with 10 units of Exonuclease (New England Bio Labs, Ipswich, MA) and 1 unit of Antarctic alkaline phosphatase (New England Bio Labs, Ipswich, MA). The mixture was incubated for 40 minutes at 37°C to remove remaining primers and unincorporated dNTPs. The enzymes were inactivated by incubating the samples at 85°C for 15 minutes. Purified PCR products were sequenced at the Huck Institute's Nucleic Acid Facility at The Pennsylvania State University using 3' BigDye-labeled dideoxynucleotide triphosphates (v 3.1 dye terminators; LifeTechnoloties, Carlsbad, CA) and run on an ABI 3730XL DNA Analyzer, using ABI Data Collection Program (v 2.0). Data was analyzed with ABI Sequencing Analysis software (Version 5.1.1). The primers used for sequencing are listed in Table 6. In total, four PCR reactions and eight sequencing reactions were conducted for each isolate being typed. Additionally, one internal sequencing reaction was required for 14/26 S. Typhimurium CRISPR2 alleles, due to the increased length of this locus. There were two alleles (only representing 2/86 S. Typhimurium isolates), 181 and 205, which required extra primers due to the presence of a duplicated region of the locus. The positions of these extra primers are shown in Additional file 1: Figure S1. CRISPR2 alleles that were sequenced using more than two primers are indicated in Table 3.

Sequence analysis and sequence type assignment
Sequences were assembled and aligned using SeqMan and MegAlign, respectively (Lasergene 10, DNA Star, Madison, WI) and unique alleles were assigned a unique numerical designation. All sequences from this study were submitted as a batch to NCBI and the accession numbers (KF465853 -KF465929) are shown for each allele in Additional file 2. For each isolate the combination of allelic types at all four loci defines the serovardesignated sequence type (ST) (Tables 2 and 3), with each unique allelic type assigned a different ST number. The presence of a SNP in any marker was sufficient to define a new allele. Analysis of CRISPR1 and CRISPR2 was performed using CRISPR-finder (http://crispr.upsud.fr/Server/). We did not identify any SNPs within either CRISPR locus that defined any allele. Allelic differences occurred from deletion of one or more spacers, addition of a spacer or duplication/triplication of a spacer. Discriminatory power was calculated using the method described by Hunter and Gaston [54], with strains defined as either unique STs or unique PFGE patterns.
Relationships between TSTs were calculated using BURST (www.pubmlst.org/analysis/), with a group definition of n-1. Unique PFGE patterns, or pulsotypes, were defined by PulseNet, using the Dice coefficient with an optimization of 1.5% and a position tolerance of 1.5%. The difference of one band is sufficient to call two PFGE patterns different. PFGE dendrograms were generated using BioNumerics v. 6.6.

S. Typhimurium outbreak study
A summary of 30 S. Typhimurium outbreak isolates that were obtained from the Pennsylvania Department of Health is listed in Table 4. Ten of these isolates associated with an outbreak in 2004 (cluster 0411PAJPX-1c) where affected patients had been on a bus trip together, though no vector was ever identified. Another 10 isolates were linked to an outbreak in 2009 (cluster 0905PAJPX-1), which was associated with live poultry. The remaining 10 isolates represent sporadic case isolates, also from 2009 but were not associated with the 0905PAJPX-1 outbreak and thus served as controls. The isolates were cultured as described above.
Consent and institutional review board (IRB) approval