Skip to main content

Shigella flexneri serotype 1c derived from serotype 1a by acquisition of gtrIC gene cluster via a bacteriophage



Shigella spp. are the primary causative agents of bacillary dysentery. Since its emergence in the late 1980s, the S. flexneri serotype 1c remains poorly understood, particularly with regard to its origin and genetic evolution. This article provides a molecular insight into this novel serotype and the gtrIC gene cluster that determines its unique immune recognition.


A PCR of the gtrIC cluster showed that serotype 1c isolates from different geographical origins were genetically conserved. An analysis of sequences flanking the gtrIC cluster revealed remnants of a prophage genome, in particular integrase and tRNAPro genes. Meanwhile, Southern blot analyses on serotype 1c, 1a and 1b strains indicated that all the tested serotype 1c strains may have had a common origin that has since remained distinct from the closely related 1a and 1b serotypes. The identification of prophage genes upstream of the gtrIC cluster is consistent with the notion of bacteriophage-mediated integration of the gtrIC cluster into a pre-existing serotype.


This is the first study to show that serotype 1c isolates from different geographical origins share an identical pattern of genetic arrangement, suggesting that serotype 1c strains may have originated from a single parental strain. Analysis of the sequence around the gtrIC cluster revealed a new site for the integration of the serotype converting phages of S. flexneri. Understanding the origin of new pathogenic serotypes and the molecular basis of serotype conversion in S. flexneri would provide information for developing cross-reactive Shigella vaccines.


The lipopolysaccharide (LPS) of shigellae is known to exhibit a high degree of antigenic diversity. This diversity arises primarily from differences in the structure and composition of the O-antigen. S. flexneri serotypes (with the exception of serotype 6) contain the same basic O-antigen backbone, namely a repeating tetrasaccharide unit made up of one N-acetylglucosamine residue (GlcNAc) and three rhamnose residues (RhaI, RhaII and RhaIII). Currently, there are at least 15 established S. flexneri serotypes, including the newly designated 1c and 7b subtypes [1], all of which are capable of causing shigellosis. There are also a few more putative new serotypes which are yet to be considered for possible official classification [2, 3]. Each of these serotypes contains a specific LPS-O antigen that is responsible for its particular serotype characteristics.

Serotype 1c, also known as 7a subtype of S. flexneri, emerged in the 1990s. The presentation of O-antigens in serotype 1c is unique, as it is the first example in which an α-D-Glcp-(1➔2)-α-D-Glcp-(termed kojibiosyl) group is added to the basic repeating unit of O-antigen [4]. Serotype 1c contains a disaccharide linked to the N-acetyleglucosamine in the basic tetrasaccharide repeating units, whereas serotype 1a and 1b strains contain only a single glucosyl group at the same site (Fig. 1).

Fig. 1
figure 1

The chemical structure of the repeating tetrasaccharide units in the O-antigen of S.flexneri serotypes 1a, 1b and 1c

The genetic mechanism responsible for O-antigen modification in serotype 1c was first elucidated by Stagg et al. [5]. The addition of the first glucosyl group is mediated by the previously characterised gtrI cluster found within a cryptic prophage at the proA locus in the bacterial chromosome. Transposon mutagenesis, performed to disrupt the gene responsible for the addition of the second glucosyl group, successfully identified the gene encoding the serotype 1c –specific O-antigen modification, which was designated as gtrIC. The gtrIC gene was present as part of a three gene cluster, arranged in a similar way to the gtr clusters present in other S. flexneri serotypes.

Adhikari et al. [6] earlier concluded that gtrI was integrated into S. flexneri by a bacteriophage via the tRNAThrW proA site. Our preliminary analysis of the sequence adjacent to the gtrIC cluster suggested the possibility of another integration site for serotype 1c prophage [5]. We hypothesized that serotype 1c strains arose, following the introduction of the gtrIC gene cluster, via a second bacteriophage that got inserted into a separate location on the chromosome of an ancestral serotype 1a strain. In this study, we show that serotype 1c strains are genetically related through conserved gtrIC sequences, and that serotype 1c isolates share an identical pattern of genetic arrangement despite their different geographical origins. In addition, we report the identification of a new site for the integration of the serotype converting phages of the S. flexneri serotype 1c strain. The experiments and sequence analyses performed in this study provide further insights into the origin of this serotype.


Bacterial culturing conditions and media

The S. flexneri strains used in the study are listed in Table 1. Bacteria were grown aerobically (180-200 RPM) at 37 °C in Luria-Bertani (LB) broth or on LB agar supplemented with appropriate antibiotics. Unless stated otherwise, antibiotics (Sigma-Aldrich) were added at the following final concentrations: ampicillin (100 μg/mL); chloramphenicol (25 μg/mL); tetracycline (10 μg/mL); and kanamycin (50 μg/mL).

Table 1 Wild type S. flexneri strains used in this study


The serological features of the S. flexneri strains were determined by slide agglutination. A sterile loop was used to mix bacteria from LB agar plates with a drop of antibody on a glass slide. The slide was gently agitated while observing for agglutination. Negative controls were performed using 0.9 % NaCl instead of antibody. Isolates were tested using both commercially available monovalent antisera (Denka Seiken, Tokyo, Japan) and the monoclonal antibody reagent MASF Ic (Reagensia AB, Sweden) directed against type-specific somatic and group O factor antigens of S. flexneri.

DNA techniques

Genomic DNA was isolated from an overnight culture using the Illustra™ bacteria Genomic Prep Mini Spin Kit (GE healthcare) in accordance with the manufacturer’s instructions. Oligonucleotide primers used for PCR were synthesized by Sigma-Aldrich (Australia), and are listed in Table 2. PCR was performed using PfuUltra II Fusion HS DNA Polymerase (Stratagene) in accordance with the manufacturer’s instructions. Purification of the PCR products was achieved using the Wizard SV Gel and PCR Clean Up system (Promega, Maddison, Wisconsin, USA). DNA sequencing was performed using the Big Dye Version 3.1 sequencing protocol, and was analysed with the ABI 3730 capillary sequence analyser at the Biomolecular Resources Facility, John Curtin School of Medical Research, Australian National University. Digestion of the DNA was performed using enzymes supplied by Fermentas.

Table 2 Primers used in this study

Bioinformatics analysis

The DNA sequence was analysed for the presence of ORFs and tRNA genes using the open access software programmes myRAST (, CLC Main workbench 6.7 (CLCbio) and NCBI ORF finder, followed by manual inspection of the start codons and ribosome binding sequences of each ORF. Genes within ORFs were predicted based on homologies to known genes found by BLASTn and BLASTp searches, as well as by the presence of Shine-Dalgarno ribosome binding sites. The corresponding proteins were compared with the non-redundant protein database using the BLASTp and BLASTx programmes available from the National Centre for Biotechnology Information ( The protein level alignments were performed using CLUSTAL W [7] and BioEdit Sequence Alignment Editor [8].

Southern blotting

Genomic DNA digestion was achieved by using DNA (1000 ng) in a total volume of 100 μl overnight digestion, with an appropriate restriction enzyme. Following an agarose gel electrophoresis of the digested genomic DNA samples, the DNA was transferred to a Hybond N+ nucleic acid transfer membrane (Amersham Biosciences) through capillary action. A DIG High Prime DNA Labelling and Detection Kit (Roche) was used to generate Digoxigenin (DIG) – labelled DNA probe. Hybridization of the membrane as well as detection were performed according to the kit manufacturer’s instructions. The membrane was viewed under a Fusion Chemiluminescence Camera (Fisher Biotech).

Results and discussion

Serotype 1c strains have a conserved gtrIC sequence

Until now, very little has been known about the extent of gtrIC conservation among S. flexneri 1c strains from different regions of the world. Therefore, in order to study the gtrIC homology and the prevalence of the putative gtrIC variants in various 1c isolates of patients from different ethnic and geographic origins, PCR was employed to detect the presence of the gtrIC gene. This was done concurrently with conventional agglutination tests. All strains which had positive serotype 1c agglutination results also produced a PCR amplicon of 1769 nt, corresponding to the presence of the gtrIC gene. As shown in Fig. 2, a PCR product of the same size was also produced in a rough serotype 1c strain which did not express serotype 1c specific O-antigen, and which therefore could not be typed by antisera. Furthermore, sequencing of the PCR amplicon in which the whole gtrIC cluster was amplified by primer pair of DG_GtrA(Ic)F(SacI) and GtrIc-R2(BamHI), revealed that the gtrA Ic and gtrB Ic genes from all the representative strains were exactly identical to each other. The results revealed that the serotype 1c strains had 100 % identical gtrIC gene nucleotide sequences as well as 100 % nucleotide identity for the whole gtrIC clusters (gtrA Ic , gtrB Ic and gtrIC genes). This means that extreme conserved nucleotide sequences exist not only in the gtrIC locus, but also in the whole gtrIC cluster.

Fig. 2
figure 2

Detection of serotype 1c strains among a variety of S. flexneri strains using PCR amplification with the gtrIC specific primer pair. Amplification of gtrIC gene cluster product was visualised under UV light following agarose gel electrophoresis in the presence of ethidium bromide. Lane:1. SFL1416, serotype 1a; 2. SFL1253, serotype 4a; 3. SFL1613, serotype 1c strain isolated from Bangladesh; 4. SFL1501, serotype 1c strain isolated from Bangladesh; 5. SFL1569, serotype 1c strain isolated from Vietnam; 6. SFL1564, rough strain isolated from Vietnam; 7. SFL1683, serotype 1c strain isolated from Egypt;8. SFL1504, serotype 1c strain isolated from Bangladesh; 9. H2O control. 10. Expected sizes of PCR products are indicated by a red arrow, which was estimated using the DNA marker, SPPI

The only exception to the above was SFL1501, which contained the gtrIC gene with a 6-bp deletion (GAAATG). Interestingly, this deletion was one of four GAAATG repeats present at the 3′ terminus of gtrIC gene (Fig. 3). Perhaps the absence of one of the four repeats of tryptophan-lysine residues at the C-terminus does not affect the overall function of the GtrIc. It is possible that sequence redundancy and the repeated sequences compensate for this loss.

Fig. 3
figure 3

The comparison of the 3′ end of gtrIC sequence of SFL1501 to the published gtrIC sequence of SFL1613. The repeating GAAATG feature in both sequences. TGA depicts a stop codon

Based on the fact that a conserved nucleotide sequence exists and no silent mutation was detected in gtrIC and its cluster sequence, we speculate that Type 1c modification plays a vital role within S. flexneri, and may assist the bacteria to a certain extent in the invasion of the epithelial cells of the host organism.

Serotype 1c isolates share an identical pattern of genetic arrangement despite differing geographical origins

Southern blotting with the gtrIC probe was used to reveal the upstream and downstream organization and distribution of the gtrIC gene cluster in different strains of S. flexneri serotype 1c. If the upstream and downstream organization of the gtrIC gene cluster are the same, two fragments should be expected with Eco32I digestion and one fragment for BamHI digestion. If, on the other hand, there are any differences between the organization of the upstream and downstream regions, fragments of variable sizes should be produced. These data should not only cast light on the organization of the upstream and downstream of gtrIC gene clusters in different strains, but also allow the determination of the number of copies of the gtrIC locus present in the genome of various 1c isolates.

A total of sixty-nine different serotype 1c isolates, obtained from Bangladesh, Egypt and Vietnam, were screened. The Eco321-digested genomic DNA of all the serotype 1c strains, when probed with gtrIC, showed two bands: a 7784 bp and a 2395 bp fragment. This was the same as the positive control SFL1613 (Fig. 4). No bands were present in the negative control.

Fig. 4
figure 4

Southern Blot of Eco32I digested chromosomal DNA with a gtrIC probe. i. Agarose gel of digested genomic DNA. ii. Southern blot analysis of digested genomic DNA. a Egyptian serotype 1c strains. Lane 1. Marker SPP-I/EcoRI; 2. SFL1613 (control strain); 3. SFL1683; 4. SFL1684; 5. SFL1686; 6.SFL1687; 7. SFL1688; 8.SFL1689; 9. SFL1690; 10.SFL1691; 11.SFL1692; 12. SFL1685. b Bangladeshi serotype 1c strains. Lane 1. Marker SPP-I/EcoRI; 2. SFL1613 (control strain); 3. SFL1500; 4. SFL1502; 5. SFL1503. c Vietnamese (Son Tay Province) serotype 1c strains. Lane 1. SPP-I/EcoRI; 2. SFL1613 (control strain); 3. SFL1564; 4. SFL1568; 5. SFL1569; 6.SFL1571; 7. SFL1575; 8.SFL1576; 9. SFL1577; 10.SFL1578; 11.SFL1579; 12. SFL1570. d Vietnamese Serotype 1c strains - Son Tay Province. (Isenbarger et al. 2001). Lane 1. SFL1580; 2. SFL1581; 3. SFL1582; 4. SFL1583; 5. SFL1584; 6.SFL1585; 7. SFL1586; 8.SFL1587; 9. SFL1588; 10.SFL1589; 11.SFL1590; 12. SFL1594.; 13.SFL1596; 14. SFL1597. 15.SFL1598; 16. SFL1600; 17.SFL1602; 18. SFL1603;19. SFL1613 (positive control strain) 20. Marker SPP-I/EcoRI. e Nha Trang, Vietnam (5 from Isenbarger et al. 2001, 13 from Prof. Cam PD). Lane 1. Marker SPP-I/EcoRI; 2. SFL1604; 3. SFL1605; 4. SFL1606; 5. SFL1607; 6.SFL1610; 7. SFL1556; 8.SFL1557; 9. SFL1558; 10.SFL1561; 11.SFL1562; 12. SFL1565.; 13.SFL1566; 14. SFL1567. 15.SFL1572; 16. SFL1573; 17.SFL1599; 18. SFL1712;19. SFL1613 (positive control strain) 20. Marker SPP-I/XbaI

BamHI-digested genomic DNA was used to examine the genetic arrangement of the downstream region of gtrIC. In all the serotype 1c strains evaluated (one Bangladeshi, four Egyptian, four Vietnamese from Son Tay province, and four Vietnamese from NhaTrang province), one band corresponding to the 12,500 bp fragment was observed when probed with gtrIC (Additional file 1: Figure S1).

The findings from both sets of Southern blot analysis show that all the serotype 1c strains had the same genetic organization upstream and downstream of the gtrIC cluster, despite their different geographic origins; also, that they were flanked by the same insertion sequences and located next to the yejO locus (Additional file 2: Figure S2). As the serotype 1c strains used in this Southern hybridization study were obtained from several different geographic locations, it would have been reasonable to expect that these S. flexneri isolates would have different structures of the gtrIC cluster. Moreover, some might well have contained an intact bacteriophage or prophage sequence, which would have resulted in different genetic arrangements of the sequence surrounding the gtrIC gene. However, our findings surprisingly showed the organization of the gtrIC gene cluster to be universal and conserved in all the S. flexneri serotype 1c strains examined. The Southern hybridization results also revealed that only one copy of gtrIC was present in all of the tested strains – which suggests that all the serotype 1c strains are likely to have originated from a single clone.

Findings like ours are not unique. Similar findings were also reported in Streptococcus pneumoniae, with different types of 37 clinical isolates from two different continents (Europe and America) having an identical tts gene directing the formation of type 37 capsular polysaccharide [9]. These isolates too constituted a highly related strain cluster (clonal complex), suggesting that every type 37 pneumococcus found globally had originated from a single parental clone.

In the same vein, a study conducted by Frosch et al. [10] using Southern blot analysis revealed a strong homology between the functional regions of the cps locus of different meningococcal serogroups. A further study by Frosch et al. [11] showed the molecular organization of the capsule gene (cps) loci in different serogroups of Neisseria meningitidis to be very similar to that of E. coli and Haemophilus influenzae. These authors concluded that the strongly homologous organization of the capsule gene loci in N. meningitidis, E. coli and H. influenzae point to a common evolutionary origin of capsule production in Gram-negative bacteria expressing group II capsular polysaccharides.

Origin of the gtrIC modification in S. flexneri serotype 1c strains

The gtrIC modification we observed may have originated either through a serotype 1a strain gaining the gtrIC or through a serotype 1c strain losing the gtrIC function. If a serotype 1a strain was derived from an ancestral serotype 1c strain, due to the gtrIC cluster in the serotype 1a strain having been disrupted by either insertion elements or through gene deletion, then remnant(s) of the gtrIC gene or the gene cluster would exist in the genome of serotype 1a strains (Additional file 3: Figure S3).

In order to investigate if remnants of the gtrIC gene or gene cluster exist in serotype 1a strains, a Southern blot analysis was performed using the gtrIC and gtrIC cluster probes. Serotype 1b strains were also included in this analysis because they have the same α1➔4 linkage to N-acetylGlc as serotype 1a strains. All together six serotype 1a and thirteen serotype 1b strains, isolated from Bangladesh, the UK and Japan, were analysed with Southern blotting. The genomic DNAs from these strains were digested with Eco32I and probed with DIG-labelled gtrIC. None of the screened serotype 1a or serotype 1b strains showed a detectable gtrIC gene remnant (Fig. 5a and b).

Fig. 5
figure 5

Southern Blot of Eco32I-digested chromosomal DNA. i. Agarose gel of digested genomic DNA. ii. Southern blot analysis of digested genomic DNA. a Bangladeshi, and Japanese serotypes 1a and 1b strains, with a gtrIC probe. Lane 1. Marker SPP-I/EcoRI; 2. SFL1613 (control strain); 3. SFL1287 (1a-Jpn); 4. SFL1288 (1a-Jpn); 5. SFL1492 (1a-Bangladesh); 6.SFL1493(1a-Bangladesh); 7. SFL1494(1a-Bangladesh); 8.SFL1495(1a-Bangladesh); 9. SFL1496(1b-Bangladesh); 10.SFL1497(1b-Bangladesh); 11.SFL1498(1b-Bangladesh); 12. SFL1499(1b-Bangladesh). b UK and Japanese serotype 1b strains, with a gtrIC probe. Lane 1. Marker SPP-I/EcoRI; 2. SFL1613 (control strain); 3. B1118 (control negative strain); 4. SFL1417 (1b-NCTC-UK); 5. SFL276 (1b-Japan); 6.SFL1289(1b-Japan); 7. SFL1300(1b-Japan); 8.SFL1309(1b-Japan); 9. SFL1315(1b-Japan); 10.SFL1316(1b-Japan); 11.SFL1277(1b-rough-Japan); 12. SFL1278(1b-rough-Japan). c Ten selected UK, Bangladeshi and Japanese serotype 1b strains, with a gtrIC cluster probe. Lane 1. SPP-I/EcoRI; 2. SFL1613 (control strain); 3. SFL1287(1a-Japan); 4. SFL1300 (1b-NCTC-UK); 5. SFL1315 (1b-Japan); 6.SFL1316(1b-Japan); 7. SFL1499(1b-B); 8.SFL1498(1b-Rough); 9. SFL1497(1b-B); 10.SFL1496(1b-B); 11.SFL1417(1b-UK); 12. SFL1277(1b-rough-Japan)

Additional Southern blotting with gtrIC cluster as a probe was then performed to confirm the results obtained. We thought that the gtrIC cluster (containing the gtrA Ic , gtrB Ic and gtrIC genes as an operon) would be able to act as a more comprehensive probe to detect remnants of gtrA Ic , gtrB Ic and gtrIC. Ten of the previously screened 1a and 1b strains were selected for this additional assay. Other than the control SFL1613 strain, which showed two bands of 2395 and 7784 bp as expected, the rest of the samples did not produce any significant band (Fig. 5c). This clearly confirmed that no remnant of gtrA Ic , gtrB Ic or gtrIC existed in any of the screened serotype 1a and 1b strains.

The lack of the gtrIC gene specifically, and more broadly of the gtrIC gene cluster, from the genomic DNA of the serotype 1a and 1b strains indicates that the gtrIC cluster did not exist in an ancestor of the serotype 1a or 1b strains. This finding thus rules out the “loss of gtrIC function” hypothesis, and proves that serotype 1a/1b strains did not derive from a serotype 1c strain. The more likely explanation, therefore, is that the gtrIC cluster was inserted into an S. flexneri serotype 1a strain via a bacteriophage. This hypothesis is consistent with the findings from the analysis of the sequence surrounding the gtrIC and gtrI clusters in serotype 1c strains.

Presence of EPEC gene in S. flexneri serotype 1c strains and sequences surrounding gtrIC cluster

We had previously studied a 7241 bp upstream and 11,906 bp downstream region surrounding the gtrIC cluster in SFL1613 and shown this to contain the IS629 isoform, ISEhe3 fragment, hypothetical ORF proteins and several housekeeping genes (yejO, narP, ccmH, dsbE, ccm) at the 3′ end, as well as insertion elements (IS911 interrupted three times by IS30, a group II intron and a putative transposase) at the 5′end [5]. A continuous 4700 bp sequence of the 7241 bp nucleotide upstream was 98 % identical to a region in the pB171 plasmid of the Enteropathogenic E. coli. Therefore, in order to identify whether SFL1613 contained any further sequence in common with pB171, additional sequencing (further upstream of the previously published 7.2 kb sequence) was performed. A stretch of the 10,243 bp nucleotide sequence further upstream of the gtrIC gene cluster was obtained by primer walking. A bioinformatics analysis of the 10,243 bp sequence revealed 21 putative open reading frames (ORFs); of which 17 were complete and 4 incomplete. Of the 17 complete ORFs, 16 were predicted to encode proteins which were significantly homologous with known proteins, while 1 ORF had no region of significant homology with proteins in the current database (Fig. 6 and Table 3).

Fig. 6
figure 6

Linear representation of the 10,243 bp nucleotide sequence further upstream from the previously sequenced gtrIC cluster in SFL1613. The first line shows the nucleotide sequence scale in base pairs. The second and third lines show the distribution of all ORFs, with horizontal arrows denoting the direction of transcription. Dark blue represents ORF in the prophage integrase region, light blue represents ORFs in the insertion sequence region

Table 3 Sequence analysis of the 10243 bp fragment further upstream of gtrIC cluster

2521 bp of sequence at the beginning of the 10,243 bp sequence is a section of a sequence region which corresponds to the S. flexneri house-keeping genes yejK (orf 1’), yejL (orf 4), yejM (orf 5), and to hypothetical proteins (orf 2, 3& 6) (Fig. 6 and Table 3). Further sequence analysis found this stretch of the SFL1613 sequence to be >99 % identical to that found in the S. flexneri serotype 2a strain 2457 T. A BlastP search revealed that the protein encoded by orf 6 has no significant homology to any existing protein in the database. Meanwhile, the protein encoded by orf-7 exhibited a high level of homology (E-value of 0.0) and 100 % (413/413aa) identity with a prophage integrase of Shigella boydii CDC3083-94 (NC_010658.1) (Table 3). Interestingly, the tRNAPro which was not identified in the previously published 19.1 kb fragment [5] was identified in this extended 10.2 kb fragment. It is located 210 bp upstream of the prophage integrase (Fig. 6).

It is also noteworthy that a tRNAPro gene, previously identified as being located between the yejM and yejO genes in S. flexneri serotype 2a (2457 T) and serotype 5a (8401) strains [12, 13], was found by this study to be located in the region upstream of the gtrIC cluster and adjacent to the yejM. These findings, together with the fact that prophage integrase and prophage related genes were located beside the tRNAPro gene, strongly suggest that the integration of a bacteriophage appears to have occurred in SFL1613, via the tRNAProsite. The tRNA genes have previously been shown to be a common integration site for bacteriophage [1416].

Four kb downstream of the prophage integrase and tRNAPro is a stretch of sequence code for orf-8 to orf-15’ proteins, whose functions are known to be associated with the bacteriophage lifestyle. This includes a truncated Xsingle stranded DNA-binding prophage protein, plus a few complete prophage hypothetical proteins such as a putative prophage regulatory protein, three putative prophage proteins, a bacteriophage DNA primase and prophage integrase, as annotated in the genome of E. coli 042 (gene bank accession number NC017626.1) [17]. This suggests that this stretch of sequence (upstream of the gtrIC cluser) was in fact derived from a phage.

Immediately upstream of the previously published 7241 bp sequence is a stretch of a sequence region which has significant homology to a number of insertion elements such as IS1400 (orfs 16–18), a hypothetical protein (orf 19), ISEhe3 (orf 20’), and IS911(orf 21’), all related to Shigella spp, Salmonella spp and E. coli. Conserved domains were detected in orf-16 from the NCBI’s Conserved Domains Database. The highest scoring match was the HTH Hin-like domain, which is a family of DNA-binding domains unique to bacteria and represented by the Hin protein of Salmonella. The Hin recombinase induces the site-specific inversion of a chromosomal DNA segment containing a promoter, which controls the alternate expression of two genes by reversibly switching orientation. The rve_3 (pfam13683), integrase core domain, which mediates integration of a DNA copy of the viral genome into the host chromosome, was detected in orf-17 [18].

Database searches and careful analysis of the 10,243 bp of nucleotide sequence and corresponding proteins in this region revealed no further sequence common to pB171 of EPEC.

The sequencing results of the 19.1 kb published sequence plus the extended 10.2 kb sequence (obtained from this study) clearly indicate that the organization of the att sites, glucosyltransferase (gtrIC) genes and int in the SFL1613 chromosome is reminiscent of a prophage, although it appears that more than half of the phage genome has been deleted. Our results also suggest that tRNAPro (upstream of the gtrIC gene cluster) and the yejO locus (downstream of gtrIC gene cluster) define the boundaries of the phage DNA in this area of the SFL1613 chromosome. A homology analysis of the proteins encoded by orf 8 through orf 15’ suggests that this region of the sequence is a prophage-related sequence. Furthermore, a Blast search matching with the enteroaggregative E. coli (EAEC) strain 042 database suggests that the 2 kb sequence downstream of the gtrIC cluster, located between the yejO locus and the IS629, is in fact derived from a phage [5]. These two findings clearly show that both the upstream and downstream of the gtrIC cluster are composed of prophage sequences which have been disrupted by various mobile genetic elements.

Another interesting observation to emerge from this study was the presence of at least 8 different insertion sequences in both the 19.1 kb and the extended 10.2 kb fragments (see Fig. 6 and Additional file 2: Figure S2). Given the large number of insertion sequences occurring in this region, it is reasonable to assume that the insertion of bacteriophage via the tRNAPro site (attL) was subsequently disrupted by insertion elements and consequently resulted in the deletion of the attR site of the tRNAPro in SFL1613.


This study provides molecular insights into the novel S. flexneri serotype 1c strain, as well as the gtrIC gene cluster that drives its unique immune recognition. This is the first study to show that serotype 1c isolates share an identical pattern of genetic arrangement despite their differing geographic origins, suggesting that serotype 1c strains may have originated from a single parental strain. The gene cluster responsible for Type 1C modification appears to have emerged in the S. flexneri serotype 1a via a bacteriophage integrated into the tRNAPro locus.

These findings expand our knowledge of the Type 1C modification of Shigella, and shed light on the genetic distribution of the gtrIC locus in serotype 1c strains. This new information will be useful for future Shigella research, and particularly for the design of safe and effective multivalent or cross-reactive vaccines against shigellosis.


DIG,Digoxigenin; EAEC, enteroaggregative E. coli; GlcNAc, N-acetylglucosamine residue; LPS, lipopolysaccharide; ORF, open reading frame; Rha, rhamnose.


  1. Barry EM, Pasetti MF, Sztein MB, Fasano A, Kotloff KL, Levine MM. Progress and pitfalls in Shigella vaccine research. Nat Rev Gastroenterol Hepatol. 2013;10(4):245–55.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Qiu S, Wang Y, Xu X, Li P, Hao R, Yang C, Liu N, Li Z, Wang Z, Wang J. Multidrug-resistant atypical variants of Shigella flexneri in China. Emerg Infect Dis. 2013;19(7):1147.

    Article  PubMed  PubMed Central  Google Scholar 

  3. Wang J, Knirel YA, Lan R, Sof’ya NS, Luo X, Perepelov AV, Wang Y, Shashkov AS, Xu J, Sun Q. Identification of an O-acyltransferase gene (oacB) that mediates 3-and 4-O-acetylation of rhamnose III in Shigella flexneri O antigens. J Bacteriol. 2014;196(8):1525–31.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Wehler T, Carlin NI. Structural and immunochemical studies of the lipopolysaccharide from a new provisional serotype of Shigella flexneri. Eur J Biochem. 1988;176(2):471–6.

    Article  CAS  PubMed  Google Scholar 

  5. Stagg RM, Tang SS, Carlin NIA, Talukder KA, Cam PD, Verma NK. A novel glucosyltransferase involved in O-antigen modification of Shigella flexneri serotype 1c. J Bacteriol. 2009;191(21):6612–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Adhikari P, Allison G, Whittle B, Verma NK. Serotype 1a O-antigen modification: molecular characterization of the genes involved and their novel organization in the Shigella flexneri chromosome. J Bacteriol. 1999;181(15):4711–8.

    CAS  PubMed  PubMed Central  Google Scholar 

  7. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994;22(22):4673–80.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. In: Nucleic acids symposium series: 1999. 1999. p. 95–8.

    Google Scholar 

  9. LLULL D, LÓPEZ R, GARCÍA E. Clonal origin of the type 37 streptococcus pneumoniae. Microb Drug Resist. 2000;6(4):269–75.

    Article  CAS  PubMed  Google Scholar 

  10. Frosch M, Weisgerber C, Meyer TF. Molecular characterization and expression in Escherichia coli of the gene complex encoding the polysaccharide capsule of Neisseria meningitidis group B. Proc Natl Acad Sci. 1989;86(5):1669–73.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Frosch M, Edwards U, Bousset K, Krauße B, Weisgerber C. Evidence for a common molecular origin of the capsule gene loci in gram-negative bacteria expressing group II capsular polysaccharides. Mol Microbiol. 1991;5(5):1251–63.

    Article  CAS  PubMed  Google Scholar 

  12. Wei J, Goldberg MB, Burland V, Venkatesan MM, Deng W, Fournier G, Mayhew GF, Plunkett G, 3rd, Rose DJ, Darling A. Complete genome sequence and comparative genomics of Shigella flexneri serotype 2a strain 2457 T. Infect Immun. 2003;71(5):2775–86.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Nie H, Yang F, Zhang X, Yang J, Chen L, Wang J, Xiong Z, Peng J, Sun L, Dong J. Complete genome sequence of Shigella flexneri 5b and comparison with Shigella flexneri 2a. BMC Genomics. 2006;7:173.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Campbell A. Prophage insertion sites. Res Microbiol. 2003;154(4):277–82.

    Article  CAS  PubMed  Google Scholar 

  15. Campbell AM. Chromosomal insertion sites for phages and plasmids. J Bacteriol. 1992;174(23):7495–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Canchaya C, Fournous G, Brüssow H. The impact of prophages on bacterial chromosomes. Mol Microbiol. 2004;53(1):9–18.

    Article  CAS  PubMed  Google Scholar 

  17. Chaudhuri RR, Sebaihia M, Hobman JL, Webber MA, Leyton DL, Goldberg MD, Cunningham AF, Scott-Tucker A, Ferguson PR, Thomas CM. Complete genome sequence and comparative metabolic profiling of the prototypical enteroaggregative Escherichia coli strain 042. PLoS One. 2010;5(1):e8801.

    Article  PubMed  PubMed Central  Google Scholar 

  18. Marchler-Bauer A, Derbyshire MK, Gonzales NR, Lu S, Chitsaz F, Geer LY, Geer RC, He J, Gwadz M, Hurwitz DI. CDD: NCBI’s conserved domain database. Nucleic Acids Res. 2015;43(Database issue):20.

    Google Scholar 

  19. Talukder KA, Islam Z, Islam MA, Dutta DK, Safa A, Ansaruzzaman M, Faruque AS, Shahed SN, Nair GB, Sack DA. Phenotypic and genotypic characterization of provisional serotype Shigella flexneri 1c and clonal relationships with 1a and 1b strains isolated in Bangladesh. J Clin Microbiol. 2003;41(1):110–7.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. El-Gendy A, El-Ghorab N, Lane EM, Elyazeed RA, Carlin NI, Mitry MM, Kay BA, Savarino SJ, Peruski LF, Jr. Identification of Shigella flexneri subserotype 1c in rural Egypt. J Clin Microbiol. 1999;37(3):873–4.

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Isenbarger DW, Hien BT, Ha HT, Ha TT, Bodhidatta L, Pang LW, Cam PD. Prospective study of the incidence of diarrhoea and prevalence of bacterial pathogens in a cohort of Vietnamese children along the Red River. Epidemiol Infect. 2001;127(2):229–36.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Stagg RM, Cam PD, Verma NK. Identification of newly recognized serotype 1c as the most prevalent Shigella flexneri serotype in Northern rural Vietnam. Epidemiol Infect. 2008;136(8):1134–40.

    Article  CAS  PubMed  Google Scholar 

Download references


We wish to thank Wen Siang Tan and Kwai Lin Thong for revising the manuscript critically, and C. Sasakawa and A. El-Gendy for providing S. flexneri strains.


This work was supported by a grant from the National Health and Medical Research Council of Australia to NKV. SST is grateful for the BKP031-14, UMRG347-15AFR grants and the SLAI fellowship from the University of Malaya and Ministry of Education of Malaysia.

Availability of data and materials

The initial 19.1 kb sequence reported in this article was deposited in the GenBank database under accession number FJ905303, and the subsequent 10.2 kb sequence determined in this study was similarly deposited in the GenBank database under accession number KR920048.

Authors’ contributions

SST contributed to the experimental design, carried out all the experiments, analyzed the results and drafted the manuscript. NIC, KAT and PDC provided S. flexneri serotype 1c strains and critically revised the manuscript. NKV conceived and directed the study, participated in the experimental design and in the analysis of the results, and revised the manuscript critically. All the authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Consent for publication

Not applicable.

Ethical approval and consent to participate

Not applicable.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Swee-Seong Tang.

Additional files

Additional file 1: Figure S1.

*Southern blot of BamHI-digested genomic DNA probe with the gtrIC gene. (i) Agarose gel of digested genomic DNA. (ii) Southern blot analysis of digested genomic DNA. Lane 1. SPP-I/XbaI; 2. SFL1502; 3. SFL1684; 4. SFL1685; 5. SFL1686; 6.SFL1687; 7. SFL1575; 8.SFL1576; 9. SFL1578; 10.SFL1579; 11.SFL1556; 12. SFL1557.; 13.SFL1558; 14. SFL1712; 15.SFL1613; 16. Marker SPP-I/EcoRI. (DOCX 240 kb)

Additional file 2: Figure S2.

*The gtrIC cluster and surrounding 19,147 bp sequence in serotype 1c strain SFL1613 [5]. (DOCX 114 kb)

Additional file 3: Figure S3.

*Schematic diagram illustrating two hypotheses that could potentially explain the evolution of the serotype 1c strain. (A) The gtrIC insertion hypothesis. (B) The deletion of gtrIC hypothesis causing the loss of functional gtrIc modification. (DOCX 115 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tang, SS., Carlin, N.I.A., Talukder, K.A. et al. Shigella flexneri serotype 1c derived from serotype 1a by acquisition of gtrIC gene cluster via a bacteriophage. BMC Microbiol 16, 127 (2016).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: