Shigella flexneri serotype 1c derived from serotype 1a by acquisition of gtrIC gene cluster via a bacteriophage

Shigella spp. are the primary causative agents of bacillary dysentery. Since its emergence in the late 1980s, the S. flexneri serotype 1c remains poorly understood, particularly with regard to its origin and genetic evolution. This article provides a molecular insight into this novel serotype and the gtrIC gene cluster that determines its unique immune recognition. A PCR of the gtrIC cluster showed that serotype 1c isolates from different geographical origins were genetically conserved. An analysis of sequences flanking the gtrIC cluster revealed remnants of a prophage genome, in particular integrase and tRNAPro genes. Meanwhile, Southern blot analyses on serotype 1c, 1a and 1b strains indicated that all the tested serotype 1c strains may have had a common origin that has since remained distinct from the closely related 1a and 1b serotypes. The identification of prophage genes upstream of the gtrIC cluster is consistent with the notion of bacteriophage-mediated integration of the gtrIC cluster into a pre-existing serotype. This is the first study to show that serotype 1c isolates from different geographical origins share an identical pattern of genetic arrangement, suggesting that serotype 1c strains may have originated from a single parental strain. Analysis of the sequence around the gtrIC cluster revealed a new site for the integration of the serotype converting phages of S. flexneri. Understanding the origin of new pathogenic serotypes and the molecular basis of serotype conversion in S. flexneri would provide information for developing cross-reactive Shigella vaccines.


Background
The lipopolysaccharide (LPS) of shigellae is known to exhibit a high degree of antigenic diversity. This diversity arises primarily from differences in the structure and composition of the O-antigen. S. flexneri serotypes (with the exception of serotype 6) contain the same basic O-antigen backbone, namely a repeating tetrasaccharide unit made up of one N-acetylglucosamine residue (GlcNAc) and three rhamnose residues (RhaI, RhaII and RhaIII). Currently, there are at least 15 established S. flexneri serotypes, including the newly designated 1c and 7b subtypes [1], all of which are capable of causing shigellosis. There are also a few more putative new serotypes which are yet to be considered for possible official classification [2,3]. Each of these serotypes contains a specific LPS-O antigen that is responsible for its particular serotype characteristics.
Serotype 1c, also known as 7a subtype of S. flexneri, emerged in the 1990s. The presentation of O-antigens in serotype 1c is unique, as it is the first example in which an α-D-Glcp-(1➔2)-α-D-Glcp-(termed kojibiosyl) group is added to the basic repeating unit of O-antigen [4]. Serotype 1c contains a disaccharide linked to the Nacetyleglucosamine in the basic tetrasaccharide repeating units, whereas serotype 1a and 1b strains contain only a single glucosyl group at the same site (Fig. 1).
The genetic mechanism responsible for O-antigen modification in serotype 1c was first elucidated by Stagg et al. [5]. The addition of the first glucosyl group is mediated by the previously characterised gtrI cluster found within a cryptic prophage at the proA locus in the bacterial chromosome. Transposon mutagenesis, performed to disrupt the gene responsible for the addition of the second glucosyl group, successfully identified the gene encoding the serotype 1c -specific O-antigen modification, which was designated as gtrIC. The gtrIC gene was present as part of a three gene cluster, arranged in a similar way to the gtr clusters present in other S. flexneri serotypes.
Adhikari et al. [6] earlier concluded that gtrI was integrated into S. flexneri by a bacteriophage via the tRNA ThrW proA site. Our preliminary analysis of the sequence adjacent to the gtrIC cluster suggested the possibility of another integration site for serotype 1c prophage [5]. We hypothesized that serotype 1c strains arose, following the introduction of the gtrIC gene cluster, via a second bacteriophage that got inserted into a separate location on the chromosome of an ancestral serotype 1a strain. In this study, we show that serotype 1c strains are genetically related through conserved gtrIC sequences, and that serotype 1c isolates share an identical pattern of genetic arrangement despite their different geographical origins. In addition, we report the identification of a new site for the integration of the serotype converting phages of the S. flexneri serotype 1c strain. The experiments and sequence analyses performed in this study provide further insights into the origin of this serotype.

Serotyping
The serological features of the S. flexneri strains were determined by slide agglutination. A sterile loop was used to mix bacteria from LB agar plates with a drop of antibody on a glass slide. The slide was gently agitated while observing for agglutination. Negative controls were performed using 0.9 % NaCl instead of antibody. Isolates were tested using both commercially available monovalent antisera (Denka Seiken, Tokyo, Japan) and the monoclonal antibody reagent MASF Ic (Reagensia AB, Sweden) directed against type-specific somatic and group O factor antigens of S. flexneri.

DNA techniques
Genomic DNA was isolated from an overnight culture using the Illustra™ bacteria Genomic Prep Mini Spin Kit (GE healthcare) in accordance with the manufacturer's instructions. Oligonucleotide primers used for PCR were synthesized by Sigma-Aldrich (Australia), and are listed in Table 2. PCR was performed using PfuUltra II Fusion HS DNA Polymerase (Stratagene) in accordance with the manufacturer's instructions. Purification of the PCR products was achieved using the Wizard SV Gel and PCR Clean Up system (Promega, Maddison, Wisconsin, USA). DNA sequencing was performed using the Big Dye Version 3.1 sequencing protocol, and was analysed with the ABI 3730 capillary sequence analyser at the Biomolecular Resources Facility, John Curtin School of Medical Research, Australian National University. Digestion of the DNA was performed using enzymes supplied by Fermentas.

Bioinformatics analysis
The DNA sequence was analysed for the presence of ORFs and tRNA genes using the open access software programmes myRAST (RASTserver.pm), CLC Main workbench 6.7 (CLCbio) and NCBI ORF finder, followed by manual inspection of the start codons and ribosome  binding sequences of each ORF. Genes within ORFs were predicted based on homologies to known genes found by BLASTn and BLASTp searches, as well as by the presence of Shine-Dalgarno ribosome binding sites. The corresponding proteins were compared with the non-redundant protein database using the BLASTp and BLASTx programmes available from the National Centre for Biotechnology Information (http://www.ncbi.nlm.nih.gov). The protein level alignments were performed using CLUSTAL W [7] and BioEdit Sequence Alignment Editor [8].

Southern blotting
Genomic DNA digestion was achieved by using DNA (1000 ng) in a total volume of 100 μl overnight digestion, with an appropriate restriction enzyme. Following an agarose gel electrophoresis of the digested genomic DNA

Results and discussion
Serotype 1c strains have a conserved gtrIC sequence Until now, very little has been known about the extent of gtrIC conservation among S. flexneri 1c strains from different regions of the world. Therefore, in order to study the gtrIC homology and the prevalence of the putative gtrIC variants in various 1c isolates of patients from different ethnic and geographic origins, PCR was employed to detect the presence of the gtrIC gene. This was done concurrently with conventional agglutination tests. All strains which had positive serotype 1c agglutination results also produced a PCR amplicon of 1769 nt, corresponding to the presence of the gtrIC gene. As shown in Fig. 2, a PCR product of the same size was also produced in a rough serotype 1c strain which did not express serotype 1c specific O-antigen, and which therefore could not be typed by antisera. Furthermore, sequencing of the PCR amplicon in which the whole gtrIC cluster was amplified by primer pair of DG_GtrA(Ic)F(SacI) and GtrIc-R2(BamHI), revealed that the gtrA Ic and gtrB Ic genes from all the representative strains were exactly identical to each other. The results revealed that the serotype 1c strains had 100 % identical gtrIC gene nucleotide sequences as well as 100 % nucleotide identity for the whole gtrIC clusters (gtrA Ic , gtrB Ic and gtrIC genes). This means that extreme conserved nucleotide sequences exist not only in the gtrIC locus, but also in the whole gtrIC cluster. The only exception to the above was SFL1501, which contained the gtrIC gene with a 6-bp deletion (GAAATG). Interestingly, this deletion was one of four GAAATG repeats present at the 3′ terminus of gtrIC gene (Fig. 3). Perhaps the absence of one of the four repeats of tryptophan-lysine residues at the C-terminus does not affect the overall function of the GtrIc. It is possible that sequence redundancy and the repeated sequences compensate for this loss.
Based on the fact that a conserved nucleotide sequence exists and no silent mutation was detected in gtrIC and its cluster sequence, we speculate that Type 1c modification plays a vital role within S. flexneri, and may assist the bacteria to a certain extent in the invasion of the epithelial cells of the host organism.
Serotype 1c isolates share an identical pattern of genetic arrangement despite differing geographical origins Southern blotting with the gtrIC probe was used to reveal the upstream and downstream organization and distribution of the gtrIC gene cluster in different strains of S. flexneri serotype 1c. If the upstream and downstream organization of the gtrIC gene cluster are the same, two fragments should be expected with Eco32I digestion and one fragment for BamHI digestion. If, on the other hand, there are any differences between the organization of the upstream and downstream regions, fragments of variable Fig. 2 Detection of serotype 1c strains among a variety of S. flexneri strains using PCR amplification with the gtrIC specific primer pair. Amplification of gtrIC gene cluster product was visualised under UV light following agarose gel electrophoresis in the presence of ethidium bromide. Lane:1. SFL1416, serotype 1a; 2. SFL1253, serotype 4a; 3. SFL1613, serotype 1c strain isolated from Bangladesh; 4. SFL1501, serotype 1c strain isolated from Bangladesh; 5. SFL1569, serotype 1c strain isolated from Vietnam; 6. SFL1564, rough strain isolated from Vietnam; 7. SFL1683, serotype 1c strain isolated from Egypt;8. SFL1504, serotype 1c strain isolated from Bangladesh; 9. H 2 O control. 10. Expected sizes of PCR products are indicated by a red arrow, which was estimated using the DNA marker, SPPI sizes should be produced. These data should not only cast light on the organization of the upstream and downstream of gtrIC gene clusters in different strains, but also allow the determination of the number of copies of the gtrIC locus present in the genome of various 1c isolates.
A total of sixty-nine different serotype 1c isolates, obtained from Bangladesh, Egypt and Vietnam, were screened. The Eco321-digested genomic DNA of all the serotype 1c strains, when probed with gtrIC, showed two bands: a 7784 bp and a 2395 bp fragment. This was the same as the positive control SFL1613 (Fig. 4). No bands were present in the negative control.
BamHI-digested genomic DNA was used to examine the genetic arrangement of the downstream region of gtrIC. In all the serotype 1c strains evaluated (one Bangladeshi, four Egyptian, four Vietnamese from Son Tay province, and four Vietnamese from NhaTrang province), one band corresponding to the 12,500 bp fragment was observed when probed with gtrIC (Additional file 1: Figure S1).
The findings from both sets of Southern blot analysis show that all the serotype 1c strains had the same genetic organization upstream and downstream of the gtrIC cluster, despite their different geographic origins; also, that they were flanked by the same insertion sequences and located next to the yejO locus (Additional file 2: Figure S2). As the serotype 1c strains used in this Southern hybridization study were obtained from several different geographic locations, it would have been reasonable to expect that these S. flexneri isolates would have different structures of the gtrIC cluster. Moreover, some might well have contained an intact bacteriophage or prophage sequence, which would have resulted in different genetic arrangements of the sequence surrounding the gtrIC gene. However, our findings surprisingly showed the organization of the gtrIC gene cluster to be universal and conserved in all the S. flexneri serotype 1c strains examined. The Southern hybridization results also revealed that only one copy of gtrIC was present in all of the tested strainswhich suggests that all the serotype 1c strains are likely to have originated from a single clone.
Findings like ours are not unique. Similar findings were also reported in Streptococcus pneumoniae, with different types of 37 clinical isolates from two different continents (Europe and America) having an identical tts gene directing the formation of type 37 capsular polysaccharide [9]. These isolates too constituted a highly related strain cluster (clonal complex), suggesting that every type 37 pneumococcus found globally had originated from a single parental clone.
In the same vein, a study conducted by Frosch et al. [10] using Southern blot analysis revealed a strong homology between the functional regions of the cps locus of different meningococcal serogroups. A further study by Frosch et al. [11] showed the molecular organization of the capsule gene (cps) loci in different serogroups of Neisseria meningitidis to be very similar to that of E. coli and Haemophilus influenzae. These authors concluded that the strongly homologous organization of the capsule gene loci in N. meningitidis, E. coli and H. influenzae point to a common evolutionary origin of capsule production in Gram-negative bacteria expressing group II capsular polysaccharides.
Origin of the gtrIC modification in S. flexneri serotype 1c strains The gtrIC modification we observed may have originated either through a serotype 1a strain gaining the gtrIC or through a serotype 1c strain losing the gtrIC function. If a serotype 1a strain was derived from an ancestral serotype 1c strain, due to the gtrIC cluster in the serotype 1a strain having been disrupted by either insertion elements or through gene deletion, then remnant(s) of the gtrIC gene or the gene cluster would exist in the genome of serotype 1a strains (Additional file 3: Figure S3).
In order to investigate if remnants of the gtrIC gene or gene cluster exist in serotype 1a strains, a Southern blot Fig. 3 The comparison of the 3′ end of gtrIC sequence of SFL1501 to the published gtrIC sequence of SFL1613. The repeating GAAATG feature in both sequences. TGA depicts a stop codon analysis was performed using the gtrIC and gtrIC cluster probes. Serotype 1b strains were also included in this analysis because they have the same α1➔4 linkage to N-acetylGlc as serotype 1a strains. All together six serotype 1a and thirteen serotype 1b strains, isolated from Bangladesh, the UK and Japan, were analysed with Southern blotting. The genomic DNAs from these strains were digested with Eco32I and probed with DIG-labelled gtrIC. None of the screened serotype 1a or serotype 1b strains showed a detectable gtrIC gene remnant ( Fig. 5a and b).
Additional Southern blotting with gtrIC cluster as a probe was then performed to confirm the results obtained. We thought that the gtrIC cluster (containing the gtrA Ic , gtrB Ic and gtrIC genes as an operon) would be able to act as a more comprehensive probe to detect remnants of gtrA Ic , gtrB Ic and gtrIC. Ten of the previously screened 1a and 1b strains were selected for this additional assay. Other than the control SFL1613 strain, which showed two bands of 2395 and 7784 bp as expected, the rest of the samples did not produce any significant band (Fig. 5c). This clearly confirmed that no remnant of gtrA Ic , gtrB Ic or gtrIC existed in any of the screened serotype 1a and 1b strains.
The lack of the gtrIC gene specifically, and more broadly of the gtrIC gene cluster, from the genomic DNA of the serotype 1a and 1b strains indicates that the gtrIC cluster did not exist in an ancestor of the serotype 1a or 1b strains. This finding thus rules out the "loss of gtrIC function" hypothesis, and proves that serotype 1a/ 1b strains did not derive from a serotype 1c strain. The more likely explanation, therefore, is that the gtrIC cluster was inserted into an S. flexneri serotype 1a strain via a bacteriophage. This hypothesis is consistent with the findings from the analysis of the sequence surrounding the gtrIC and gtrI clusters in serotype 1c strains.
It is also noteworthy that a tRNA Pro gene, previously identified as being located between the yejM and yejO genes in S. flexneri serotype 2a (2457 T) and serotype 5a (8401) strains [12,13], was found by this study to be located in the region upstream of the gtrIC cluster and adjacent to the yejM. These findings, together with the fact that prophage integrase and prophage related genes were located beside the tRNA Pro gene, strongly suggest that the integration of a bacteriophage appears to have occurred in SFL1613, via the tRNA Pro site. The tRNA genes have previously been shown to be a common integration site for bacteriophage [14][15][16].
Four kb downstream of the prophage integrase and tRNA Pro is a stretch of sequence code for orf-8 to orf-15' proteins, whose functions are known to be associated with the bacteriophage lifestyle. This includes a truncated Xsingle stranded DNA-binding prophage protein, plus a few complete prophage hypothetical proteins such as a putative prophage regulatory protein, three putative prophage proteins, a bacteriophage DNA primase and prophage integrase, as annotated in the genome of E. coli 042 (gene bank accession number NC017626.1) [17]. This suggests that this stretch of sequence (upstream of the gtrIC cluser) was in fact derived from a phage.
Immediately upstream of the previously published 7241 bp sequence is a stretch of a sequence region which has significant homology to a number of insertion elements such as IS1400 (orfs 16-18), a hypothetical protein (orf 19), ISEhe3 (orf 20'), and IS911(orf 21'), all related to Shigella spp, Salmonella spp and E. coli. Conserved domains were detected in orf-16 from the NCBI's Conserved Domains Database. The highest scoring match was the HTH Hin-like domain, which is a family of DNA-binding domains unique to bacteria and represented by the Hin protein of Salmonella. The Hin recombinase induces the site-specific inversion of a chromosomal DNA segment containing a promoter, which controls the alternate expression of two genes by reversibly switching orientation. The rve_3 (pfam13683), integrase core domain, which mediates integration of a DNA copy of the viral genome into the host chromosome, was detected in orf-17 [18].
Database searches and careful analysis of the 10,243 bp of nucleotide sequence and corresponding proteins in this region revealed no further sequence common to pB171 of EPEC.
The sequencing results of the 19.1 kb published sequence plus the extended 10.2 kb sequence (obtained from this study) clearly indicate that the organization of the att sites, glucosyltransferase (gtrIC) genes and int in the SFL1613 chromosome is reminiscent of a prophage, although it appears that more than half of the phage genome has been deleted. Our results also suggest that tRNA Pro (upstream of the gtrIC gene cluster) and the yejO locus (downstream of gtrIC gene cluster) define the boundaries of the phage      DNA in this area of the SFL1613 chromosome. A homology analysis of the proteins encoded by orf 8 through orf 15' suggests that this region of the sequence is a prophagerelated sequence. Furthermore, a Blast search matching with the enteroaggregative E. coli (EAEC) strain 042 database suggests that the 2 kb sequence downstream of the gtrIC cluster, located between the yejO locus and the IS629, is in fact derived from a phage [5]. These two findings clearly show that both the upstream and downstream of the gtrIC cluster are composed of prophage sequences which have been disrupted by various mobile genetic elements.
Another interesting observation to emerge from this study was the presence of at least 8 different insertion sequences in both the 19.1 kb and the extended 10.2 kb fragments (see Fig. 6 and Additional file 2: Figure S2). Given the large number of insertion sequences occurring in this region, it is reasonable to assume that the insertion of bacteriophage via the tRNA Pro site (attL) was subsequently disrupted by insertion elements and consequently resulted in the deletion of the attR site of the tRNA Pro in SFL1613.

Conclusion
This study provides molecular insights into the novel S. flexneri serotype 1c strain, as well as the gtrIC gene cluster that drives its unique immune recognition. This is the first study to show that serotype 1c isolates share an identical pattern of genetic arrangement despite their differing geographic origins, suggesting that serotype 1c strains may have originated from a single parental strain. The gene cluster responsible for Type 1C modification appears to have emerged in the S. flexneri serotype 1a via a bacteriophage integrated into the tRNA Pro locus.
These findings expand our knowledge of the Type 1C modification of Shigella, and shed light on the genetic distribution of the gtrIC locus in serotype 1c strains. This new information will be useful for future Shigella research, and particularly for the design of safe and effective multivalent or cross-reactive vaccines against shigellosis.