Alignment of 106 fliC sequences generated in this study and 32 phase 1 flagellin sequences previously published (see Methods section), representing 35 phase 1 flagellar serotypes revealed a clear division of sequences into two groups. Representative sequences are aligned in Additional file 1. A tree indicating the relatedness of these sequences generated from translated DNA sequence supported this division with a 100% bootstrap value (Figure 1). Sequences encoding phase 1 flagellar antigens exhibiting antigenic factors "g" or "m,t" are referred to as members of the g-complex and the fliC sequences of this group clustered exclusively with the non-motile strains Gallinarum and Pullorum on the tree (Cluster I, Figure 1). The level of amino acid sequence homology within Cluster I sequences was 90.05%. Sequences not encoding the antigenic factors "g" or "m,t", formed the second group of sequences (Cluster II), referred to here as the non-g complex. Lower levels (80.3 %) of amino acid similarity were observed within Cluster II. Sero-specific polymorphisms were identified within the central variable region where consensus sequences of Cluster I and Cluster II diverged, between amino acid positions 160 – 407 (based on amino acid numbering system of the sequenced strain of S. Typhimurium (AE008787) represented here as sequence type Typhimurium_a).
Salmonella fliC sequences were conserved at their termini and variable in the central region between serotypes [16, 18] and clustered according to allele. Amino acid and nucleotide positions described here-in are with reference to the sequenced strain LT2. It was apparent from the alignment of sequences generated in this study that two assays were required, one encompassing Cluster I strains and one for Cluster II. Multiple alignments were created for each cluster and regions of the fliC gene containing sero-specific polymorphisms were identified at nucleotide positions 917 – 933 and 739–749 in Cluster I and Cluster II respectively (Figures 3 and 4). PCR primers were designed to amplify the target region in each sequence (see below). One multiplex PCR was developed for each group containing a mixture of specific primers. All primers designed for short sequence assays in this study are shown in Additional file 3 and the testing algorithm is shown in Figure 5.
Summary of fliC sequence variation within the g-complex
All polymorphisms within the g-complex sequences analysed are displayed in Figure 2 The target region (highlighted) was selected because it conferred multiple sero-specific amino acid substitutions and was variable at the DNA level. In the 17 bp nucleotide sequence assayed, 15 sequence types were identified (Figure 3). This region was assayed against the test panel of 17 Salmonella strains belonging to the g-complex and was able to exclusively identify sequence motifs corresponding to phase 1 flagellar serotypes. The serotypes not differentiated by this assay ([f],g,m, [p], g,m, g,m,s and g,m, [p],s or non-motile Gallinarum) were known from full sequencing to be identical at the target region.
Amino acid differences between g-complex strains identified by full sequencing
The following polymorphisms located in fliC of the g-complex are likely to be involved in specific epitope formation: two amino acid sequence types were observed in 25 fliC-[f],g,m, [p] sequences obtained from Salmonella enterica serovar Enteritidis strains. Twenty-three S. Enteritidis strains demonstrated complete conservation in their DNA sequence (B16, B18, JTCM02 and 20 phage type 4 strains (Enteritidis_b)). The sequence of B17 was congruent with published S. Enteritidis (M84980) (Enteritidis_a), and exhibited a single amino acid (Ser>Gly at 302) substitution compared to sequence type Enteritidis_b. Published S. Othmarschen (U06455) fliC-g,m, [t] sequence inferred the same amino acid sequence as Enteritidis_a but exhibited a silent mutation at the DNA level. As the fliC sequence for these two serotypes was identical it was apparent that the sequence included here represented an S. Othmarschen strain in which the t factor was absent.
Published S. Gallinarum sequences demonstrated 100% DNA homology to Enteritidis_b except for a SNP encoding a stop codon in M84975. S. Pullorum and S. Gallinarum are non-motile as they do not express flagella. Antisera to the g factor antigen react strongly with induced-motility S. Pullorum culture, indicating that g epitopes are expressed in these cells [19]. This correlates with our sequence data as S. Pullorum clusters with g,m sequences (Figure 1). Biotype-specific polymorphisms for S. Pullorum were observed at amino acid position 91 and 323. Molecular identification of S. Pullorum and S. Gallinarum would be of considerable benefit as standard serotyping cannot differentiate these two serotypes.
FliC-g,q was differentiated from all other g-complex sequences by an Asp>Gly serotype-specific polymorphism observed at position 284 for S. Moscow. A Thr>Ala substitution at residue 304 conferred by a single nucleotide polymorphism (SNP) was identified between sequences of g,m and g,p, congruent with a previous report [20], and forms the basis for differentiation of these two serotypes. DNA polymorphisms, but no inferred amino acid substitutions, were observed between strains exhibiting g,m,s and g,m, [p],s. The p factor was not coded for by the fliC sequences of these strains.
S. Essen fliC-g,m was distinct from other g and m coding sequences by an Asp>Asn substitution at 283. fliC-g,p,s could be differentiated from fliC-g,p by a Thr>Ala substitution at 254. A motif of two amino acids at positions 302 and 307 was common to S. Derby, S. Agona, S. Adelaide, and S. Berta which exhibit phase 1 flagellar antigenic factors "f" and "g". This motif was exclusive to these serotypes. DNA sequence variation at corresponding positions allowed S. Derby and S. Agona to be distinguished from S. Adelaide and S. Berta. FliC-g,z51; and fliC-m,t with fliC-g,m,t each form distinct clusters (Figure 1).
Summary of fliC sequence variation within the non-g complex
Sequence conservation within alleles that did not encode g or m,t antigenic factors was demonstrated by 97.8 – 99.1% homology and 80.35% homology was measured in the complex. The high level of variability between alleles in this group did not allow association of specific amino acids to epitope formation that was possible with the g-complex sequences. The quantity and distribution of polymorphic bases observed in this group (specified below) meant that there was a choice of regions that could be used for differentiation. Following testing of four possible regions, the region encompassing amino acids 248–250 was selected for use in the final non-g assay. Each serotype had a unique motif at the target region except fliC-l,v and fliC-l,z13 which shared a sequence type (Figure 4).
Some amino acid sequences were not identical within non-g alleles, including i, r, d, e,h, a and z4,z23 (Additional file 1). A previous study of fliC-i sequences reported no variation in a 260 bp region among seven Typhimurium strains [17]. Six full S. Typhimurium fliC s and a fragment spanning nucleotides 434–1090, corresponding to amino acids 159 – 400, of a further 20 S. Typhimurium strains were sequenced. Three distinct DNA sequences which resulted in translated differences in the expressed peptides were observed within the serotype. Sequence type "Typhimurium_a" was detected in 18 strains, identical to the sequenced strain LT2. Sequence type "Typhimurium_b" was detected in four strains and was differentiated by a SNP at 768, conferring a 256 Glu>Lys substitution. Sequence type "Typhimurium_c" conferred a Glu>Lys substitution and an Ala>Thr substitution at 263 and was found in two strains: 571896 and 571913. Strains 571896 and 571913 were phage type DT104 however, other strains tested did not conform to recognised phage typing patterns so no assured correlation could be made with phage type or other phenotype. S. Choleraesuis sequence (fliC-c) differed from that published (AF159459) at one nucleotide, conferring amino acid substitution of Thr >Ser at codon 99.
FliC sequences of nine S. Heidelberg strains were identical, consistent with the results of a previous report [18]. The published sequence for fliC-r of S. Rubislaw (X04505) differed from S. Heidelberg at three amino acids. The S. Muenchen sequence determined in this study differs in twelve amino acids to the published S. Muenchen (X03395), and differed in 25 amino acids from the S. Duisberg sequence in this study. S. Anatum, S. Newport and S. Saintpaul exhibit factors e,h in their phase 1 flagellar. Amino acid sequence was conserved in two strains of S. Saintpaul but distinct for each serotype due to four amino acid substitutions at codons 192, 213, 238, 356. S. Brandenburg and S. Panama exhibit l,v in the phase 1 antigen, no inferred amino acid differences were detected. FliC-l,v sequences clustered with fliC-l,z13(Figure 1).
FliC from three strains exhibiting the z4 antigenic factor in phase 1 flagellar were sequenced. Cluster analysis grouped these sequences together in the non-g group although they contain regions of sequence similar to g-complex strains (amino acid positions 96 – 164). Z4,z24 is distinct from z4,z23 and z4,z23 sequences varied within the serotype at seven amino acid positions: 235, 237, 239, 242, 253, 351 and 369. The complex mosaic nature of fliC is evident from analysis of amino acid alignment of sequences in particular strains from subgroups in the SARC collection (see Materials and Methods).
Molecular serotyping assays
By comparison of amino acid sequences coding for antigens of the different serotypes, sero-specific motifs were identified. Individual regions of fliC were selected for the g-group and non-g group to provide unique sequence for as many serotypes as possible, while keeping the assay simple to perform and analyse. Two multiplex PCRs were developed for the production of fliC amplicon of g-complex strains and fliC amplicon of non-g strains. Sero-specific motifs in each amplicon were consequently identified by sequencing-by-synthesis.
G-complex assay
Fifteen sequence types were identified in the 17 bp of nucleotide sequence assayed (Figure 3). Twenty-seven strains were tested and each produced a recognised sequence motif which differentiated between serotypes. Serotypes would be fully resolved through the detection of further polymorphisms, for example g, [s],t and g,t can be separated through additional detection of a A>G change at nucleotide position 777 conferring amino acid Ser>Gly substitution specific to g,t.
Non-g assay
Fourteen sequence types were identified in the 9 bp of nucleotide sequence assayed (Figure 4). Thirty strains were tested, each producing a recognised sequence motif allowing separation of serotypes. Serotypes l,v and l,z13 gave the same motif at the target region but could be separated by nucleotide substitution A>G at position 783 conferring a Thr>Ala change.
The stability of the targeted polymorphisms in Salmonella phase 1 flagellar antigens was demonstrated through testing on a panel of 55 isolates. The SNP responsible for the antigenic difference between serotypes g,m and g,p was within the target region and so could be differentiated by the assay. The amino acid substitution that separated fliC-g,p,u was also encoded within the sequence assayed. Antigens i, r, c, d, b, e,h, k, a, z41, z, z10, z4,z23, z4z24, g,q, g,m,p, g,p,u, [f],g,t, g,z51 and biotype S. Pullorum gave unique motifs, l,v and l,z13 shared a motif. Some serotypes for which certain factors may be present or absent (denoted by square brackets in antigenic formulae) were not separated from similar serotypes: [f],g,m, [p], g,m and g,m, [p],s; [f],g,m, [p] and g,m, [t]; g, [s],t and g,t although these could be separated by other DNA polymorphisms as discussed. Two motifs were observed for k, each specific to S. Thompson and IIIb. Two motifs were observed for d, specific to S. Duisberg and S. Muenchen / S. Schwarzengrund. Published sequence data for fliC-m,t, from serotypes S. Banana, S. Oranienburg and S. Pensacola were included in assay design. The polymorphic region targeted by the assay is predicted to differentiate m,t sequences from other g-complex antigens, and also differentiate S. Pensacola from S. Banana and S. Oranienberg. Strains exhibiting factors m,t were not available for testing.