Comparative genomics of VirR regulons in Clostridium perfringens strains
© Frandi et al; licensee BioMed Central Ltd. 2010
Received: 18 August 2009
Accepted: 25 February 2010
Published: 25 February 2010
Clostridium perfringens is a Gram-positive anaerobic bacterium causing severe diseases such as gas gangrene and pseudomembranosus colitis, that are generally due to the secretion of powerful extracellular toxins. The expression of toxin genes is mainly regulated by VirR, the response regulator of a two-component system. Up to now few targets only are known for this regulator and mainly in one strain (Strain 13). Due to the high genomic and phenotypic variability in toxin production by different strains, the development of effective strategies to counteract C. perfringens infections requires methodologies to reconstruct the VirR regulon from genome sequences.
We implemented a two step computational strategy allowing to consider available information concerning VirR binding sites in a few species to scan all genomes of the same species, assuming the VirR targets are at least partially conserved across these strains. Results obtained are in agreement with previous works where experimental validation of the promoters have been performed and showed the presence of a core and an accessory regulon of VirR in C. perfringens strains with three target genes also located on plasmids. Moreover, the type E strain JGS1987 has the largest predicted regulon with as many as 10 VirR targets not found in the other genomes.
In this work we exploited available experimental information concerning the targets of the VirR toxin regulator in one C. perfringens strain to obtain plausible predictions concerning target genes in genomes and plasmids of nearby strains. Our predictions are available for wet-lab researchers working on less characterized C. perfringens strains that can thus design focused experiments reducing the search space of their experiments and increasing the probability of characterizing positive targets with less efforts. Main result was that the VirR regulon is variable in different C. perfringens strains with 4 genes controlled in all but one strains and most genes controlled in one or two strains only.
Clostridium perfringens is a Gram-positive anaerobic species able to form heat-resistant endospores and to live in many habitats, from marine sediments to animal gut, to soil. The genus Clostridium comprises species causing severe diseases such as botulism, tetanus, gas gangrene and pseudomembranosus colitis that are generally due to the secretion of powerful toxins. C. perfringens is the most prolific toxin producer within the genus; several of its extracellular toxins and enzymes have been identified as for instance α-toxin (plc, phospholipase C), β-toxin (hemolysin family toxin), ϵ-toxin, θ-toxin (pfoA), κ-toxin (colA, collagenase) and others. Toxins are thought to act synergistically in the development of pathogenesis, and C. perfringens strains show a high degree of phenotypic and pathogenic variability, so that understanding the control of the expression of toxin genes is critical to help in fighting diseases caused by this bacterium. The identification of similarities and differences in the set of pathogenic instruments (i.e. genes) of different strains will help to define effective strategies of infection control.
Pathogens usually have precise control mechanisms for toxin production so that expression only takes place when required e.g. when the density of the bacterial population overcomes a certain threshold, or when the bacterium reaches a certain cell-type/organ.
Genomes and plasmids analyzed
Results and Discussion
Comparisons of C. perfringens strains
As a preliminary analysis we studied the variability of the selected genomes using both standard phylogenetic techniques and a comparison of all intergenic sequences. The alignment of rrnA operons for a total of 4719 nt was used to build a Neighbor-Joining tree revealing that these strains are closely related [Additional file 1: panel a]. In agreement with a low differentiation on ribosomal operon sequences, bootstrap support for the branching pattern was quite low; in fact, 32 variable sites only were found in the alignment, which were evenly distributed between strains [Additional file 1: panel b]. However, the comparison of a large number of intergenic sequences extracted from the genomes revealed that some of them are quite variable between the different strains with respect to the very conserved rrnA operon (down to 82% with respect to C. perfringens Str. 13, [Additional file 1: panel c]).
Regulon prediction in sequenced C. perfringens strains
The conserved VirR regulon
Conserved VirR regulon
Strain specific VirR targets
hypothetical protein AC3_0622
hypothetical protein AC3_A0724
hypothetical protein AC3_A0725
conserved hypothetical protein
put. lipid A export ATP-binding/permease (MsbA)
hypothetical protein AC3_A0587
hypothetical protein AC3_0277
hypothetical protein AC3_A0194
hypothetical protein AC1_A0478
hypothetical protein AC5_A0236
put. metal-dependent hydrolase
hypothetical protein CJD_0545
hypothetical protein CJD_1387
One target only appeared to be conserved in all tested strains, corresponding to the α-clostripain gene. Four genes were shown to be conserved in all strains but SM101. Interestingly, strain SM101 appeared to have the lowest degree of conservation of VirR targets. A search for the corresponding gene sequences in the genome confirmed that they are absent, in agreement with a previous comparative analysis that showed the absence of several virulence factors and toxins and the presence of specific repertoire of genes encoding bacteriocins . On the converse, missing genes in draft genomes cannot be considered as surely absent. Concerning CPE0920 (virU) and CPF_1074, corresponding to a regulatory RNA encoding gene and to a gene with unknown function respectively, they have not been identified in some of the genomes, but using their sequences we were able to identify regions with perfect matching using blastn (data not shown) and to locate VirR motifs in their upstream regions (see Table 2). Myers et al.  showed that purified VirR is able to bind the promoter of CPR_0761 and of CPF_0461. From our analysis it emerged that CPF_0461 in str. ATCC1324 is the ortholog to CPR_0762 in str. SM101, for which too we predicted the presence of a VirR binding motif upstream. This motif is the same attributed to CPR_0761 and whose ability to bind VirR has been tested by Myers et al., 2006. Our comparative analysis, then suggests that the truly regulated gene could be the latter, because of the conservation of the site upstream of its homologs in two other organisms (ATCC3626 and ATCC1324), while we were not able to find sequences resembling CPR_0761 in any other C. perfringens strain by blasting both protein and nucleotide sequences against their genomes. Alternatively, the two genes can also form an operon, with CPR 0761 performing an unknown function.
The accessory VirR regulon
We consider this dataset low confidence for two reasons: first of all this group of genes comprises only one experimentally verified target, i.e. virT (CPE0845, ) and moreover, all other genes have been found in draft genomes only. The list of all putative targets of VirR is shown in Table 3.
Notably, JGS1987 is characterized by an expansion of the VirR predicted regulon, while the accessory regulon of ATCC3626, F4969 and SM101 strains is composed of a single gene. The case of virT, a regulatory RNA, is particularly interesting. This sRNA implements a negative feed-back loop on some of the VirR targets i.e. pfoA and ccp . Our analysis showed that virT is present in two strains only (strain 13 and strain ATCC3626). We can thus predict that the other strains lack this negative control and express pfoA and ccp at different levels eventually by using additional regulations. Actually, strains as ATCC 13124 produces large quantities of gangrene-associated toxins  and JGS1987 is a Type E strain which, tough containing an enterotoxin gene (cpe), did not show enterotoxin production . The relatively large predicted regulon (10 genes) of JGS1987 may contain genes responsible for its peculiar pathogenicity profile. Within such regulon seven genes code for proteins of unknown function. One of them corresponds to a resolvase/recombinase (AC3_0180) suggesting a possible scenario in which host invasion is linked to gene mobilization. The other two genes with assigned function in the putative regulon of strain JGS1987 include a 2-keto-3-deoxygluconate kinase and a putative lipid A export permease. The first one has been associated with resistance to oxidative stress in C. perfringens mutants after transposon mutagenesis . Concerning the putative permease of lipid A, it is known that lipid A is one of the main mediator of bacterial pathogenesis and strongly stimulates in ammation in host tissues , so that our prediction is reasonable.
The 'mobile' VirR regulon
Our analysis identified three targets located on plasmids, one coding for ϵ-toxin (pCP8533etx_p28) in plasmid pCP8533etx from strain NCTC 8533B4D, in addition with two hypothetical proteins, sharing 98% identity, in pCP8533etx (pCP8533etx_p40) and in pCPF5603 (pCPF5603_50) of strain F5603, respectively. Concerning plasmid pCP8533etx, we noticed that it is also present in the shotgun sequences from ATCC3626 (data not shown based on blastn comparisons) and also in that case we were able to find a VirR motif upstream of the gene encoding ϵ-toxin.
In this work we exploited experimental information concerning a small number of promoters controlled by VirR to predict the corresponding regulons in all other C. perfringens genomes and plasmids available. Our results are in agreement with previous analysis and suggest that the size of the VirR regulon is quite variable in the analyzed strains as also evidenced by works showing that these strains encode different repertoires of toxin genes. Particularly interesting are the cases concerning vrr, virU and virT, because they encode regulatory RNA that affect gene expression of several other genes. Thus, even at the short phylogenetic distances spanned by these strains [Additional file 1], there could be significant changes in the regulatory cascade initiated by VirR. An event of gain or loss of a VirR target can affect the gene itself only, such as when the event involves a gene coding for a toxin, or it can spread downstream of VirR when it involves a regulatory gene, so that also its targets will be affected. As an example consider the regulation exerted by VirR on virT in Str. 13 (figure 1a). This gene is present only in Str. 13 and in Str. ATCC3626, where it is regulated by VirR. Experiments have demonstrated that virT encodes a small RNA able to repress the expression of ccp and pfoA and all these genes are positively controlled by VirR. The loss/gain of virT or of VirR binding sites in its promoter will thus have an impact on its own expression, but this will propagate downstream to ccp and pfoA.
The prediction of VirR targets in the genome of strain JGS1987 revealed the presence of 10 specific putative targets that could be important for the peculiar characteristics of this strain.
On an evolutionary perspective, we noticed that once one gene have been found to be regulated by VirR in one genome, it is either regulated by VirR in other genomes or it is lost. This suggests that many of these genes are useful only when controlled by VirR, and also in this case, that their function is not essential for pathogenesis. Then we can imagine that after loss of the VirR binding site these genes are rapidly deleted from the genome; alternatively the deletion may involve both the gene and its promoter. This may happen when the deletion of relatively large genomic regions occurs. Actually, genomes of C. prefringens strains have been shown to possess many different genomic islands which may be subjected to frequent events of rearrangemens .
Binding sites identification
where F ij is the frequency of the i th base at the j th position. S i is an information-based measure of potential binding sites. We retained only motifs having a score larger than or equal to the lowest score for an experimentally validated target, corresponding to a threshold of 0.88. Each motif found along the genome was then associated with a gene when located within the region going from 100 nucleotides downstream to 600 nucleotides upstream of the corresponding first codon and on the same strand of the motif.
Clustering protein sequences
where S ii is the maximal score attainable using the i th query and it corresponds to the query aligned with itself. The adjacency matrix is normalized to make it stochastic, a prerequisite for the MCL algorithm used to define clusters of orthologous sequences. The MCL algorithm simulates flow alternating two algebraic operations on matrices: expansion of the input matrix (M out = M in * M in ) models the spreading out of flow and inflation (m ij = ). Parameter r controls the granularity of the clustering and it is set to 2.
After these two steps we apply diagonal scaling to keep the matrix stochastic and ready for the next iteration. Inflation models the contraction of flow, and it is thicker in regions of higher current and thinner in regions of lower current. The consequence is that the flow spreads out within clusters while evaporating in-between clusters leaving at convergence an idempotent matrix revealing the clusters hidden in the original adjacency matrix.
Concerning the identification of VirR targets, we analysed plasmids with the same procedure used for genomes. Phylogenetic profiling and the hypergraph describing the similarity in gene contents of different plasmid molecules were calculated using the software Blast2network  and visualization with the software Visone . The phylogenetic profiling technique is described in detail in several papers, e.g. [18, 19] so that we will not discuss it here in detail, it is enough to say that by comparing the distribution of different genes in different plasmids we can quantify the extent at which proteins tend to co-occur which is an indication of the degree of functional overlapping between different proteins. We want to spend some word concerning the hypergraph shown in figure 3. Let's suppose to have an adjacency matrix describing homologies between proteins encoded by several different plasmids. In this matrix, element m ij corresponds to the similarity between sequences i and j. However these matrices can be quite large (i.e. the total number of proteins in the study set), so that it is possible to apply some dimensionality reduction approach to extract the information we are interested in. In our case, given the mobility of genes encoded on plasmids, we wanted to assess the degree of similarities between them in term of gene content, and to identify the most plausible routes for gene exchange in the strains under analysis. One way to do that is to calculate the similarity in the phylogenetic profiles of each plasmid and then reduce the original matrix to a new one whose size corresponds to the number of plasmids in the dataset. In this new matrix, the values correspond to the similarity in gene content between every pair of plasmids. Given the binary nature of phylogenetic profiles calculated by B2N, it is possible to to quantify the level of similarity between them using the Jaccard similarity coefficient. Plasmids with highly similar gene content will then give very tight clusters, and plasmids in-between different clusters (sharing some of their genes with plasmids in one clusters and some other genes with an otherwise unrelated cluster of plasmids) could be important because they share genes with different molecules i.e. they could represent preferential routes for the passage of genes between plasmids that are not in contact.
Alignments and Phylogenetic analysis
The alignment of rrnA operons was performed using the software muscle  with default parameters. The alignment has a total of 4719 nucleotides, 32 of which are variable, and was used as input to the software mega  to build a phylogenetic tree. The algorithm used was the Neighbor-Joining with different rates for transitions and transversions and 100 bootstrap replicates.
Comparison of intergenic sequences
The comparison of intergenic sequences was performed as follows: all intergenic sequences were extracted from the genome of Str. 13 using gene annotations and were then filtered for a minimum length of 100 nucleotides, obtaining 1633 sequences. These sequences were then blasted against the other genomes. We retained each first blast hit when the e-value of the alignment was less then 1E-06. The boxplots shown in [Additional file 1: panel c] have been obtained for the totality of matches for a genome.
MB is funded ANR Project MetaGenoReg (ANR-06-BYOS-0003).
- Whitworth D, Cock P: Evolution of prokaryotic two-component systems: insights from comparative genomics. Amino Acids. 2009, 37: 459-466. 10.1007/s00726-009-0259-2.View ArticlePubMedGoogle Scholar
- Cheung J, Awad M, McGowan S, Rood J: Functional analysis of the VirSR phosphorelay from Clostridium perfringens. PLoS One. 2009, 4: e5849-10.1371/journal.pone.0005849.PubMed CentralView ArticlePubMedGoogle Scholar
- Ba-Thein W, Lyristis M, Ohtani K, Nisbet I, Hayashi H, Rood J, Shimizu T: The virR/virS locus regulates the transcription of genes encoding extracellular toxin production in Clostridium perfringens. J Bacteriol. 1996, 178: 2514-20.PubMed CentralPubMedGoogle Scholar
- Cheung J, Rood J: The VirR response regulator from Clostridium perfringens binds independently to two imperfect direct repeats located upstream of the emphpfoA promoter. J Bacteriol. 2000, 182: 57-66. 10.1128/JB.182.1.57-66.2000.PubMed CentralView ArticlePubMedGoogle Scholar
- Cheung J, Dupuy B, Deveson D, Rood J: The spatial organization of the VirR boxes is critical for VirR-mediated expression of the perfringolysin O gene, pfoA, from Clostridium perfringens. J Bacteriol. 2004, 186: 3321-30. 10.1128/JB.186.11.3321-3330.2004.PubMed CentralView ArticlePubMedGoogle Scholar
- Shimizu T, Yaguchi H, Ohtani K, Banu S, Hayashi H: Clostridial VirR/VirS regulon involves a regulatory RNA molecule for expression of toxins. Mol Microbiol. 2002, 43: 257-65. 10.1046/j.1365-2958.2002.02743.x.View ArticlePubMedGoogle Scholar
- Okumura K, Ohtani K, Hayashi H, Shimizu T: Characterization of genes regulated directly by the VirR/VirS system in Clostridium perfringens. J Bacteriol. 2008, 190: 7719-27. 10.1128/JB.01573-07.PubMed CentralView ArticlePubMedGoogle Scholar
- Myers G, Rasko D, Cheung J, Ravel J, Seshadri R, DeBoy R, Ren Q, Varga J, Awad M, Brinkac L, Daugherty S, Haft D, odson D, Madupu R, Nelson W, Rosovitz M, Sullivan S, Khouri H, Dimitrov G, Watkins K, Mulligan S, Benton J, Radune D, Fisher D, Atkins H, Hiscox T, Jost B, Billington S, Songer J, McClane B, Titball R, Rood J, Melville S, Paulsen I: Skewed genomic variability in strains of the toxigenic bacterial pathogen, Clostridium perfringens. Genome Res. 2002, 16: 1031-40. 10.1101/gr.5238106.View ArticleGoogle Scholar
- Mollby R, Holme T: Production of phospholipase C (α-toxin), haemolysins and lethal toxins by Clostridium perfringens types A to D. J Gen Microbiol. 1976, 96: 137-144.View ArticlePubMedGoogle Scholar
- Sawires Y, Songer J: Clostridium perfringens : insight into virulence evolution and population structure. Anaerobe. 2006, 12: 23-43. 10.1016/j.anaerobe.2005.10.002.View ArticlePubMedGoogle Scholar
- Briolat V, Reysset G: Identification of the Clostridium perfringens Genes Involved in the Adaptive Response to Oxidative Stress. J Bacteriol. 2002, 184: 2333-2343. 10.1128/JB.184.9.2333-2343.2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Lee V, Schneewind O: Protein secretion and the pathogenesis of bacterial infections. Genes Dev. 2001, 15: 1725-1752. 10.1101/gad.896801.View ArticlePubMedGoogle Scholar
- Brilli M, Mengoni A, Fondi M, Bazzicalupo M, Lió P, Fani R: Analysis of plasmid genes by phylogenetic profiling and visualization of homology relationships using Blast2Network. BMC Bioinformatics. 2008, 9: 551-10.1186/1471-2105-9-551.PubMed CentralView ArticlePubMedGoogle Scholar
- Miyamoto K, Li J, Sayeed S, Akimoto S, McClane BA: Sequencing and diversity analyses reveal extensive similarities between some epsilon-toxin-encoding plasmids and the pCPF5603 Clostridium perfringens enterotoxin plasmid. J Bacteriol. 2008, 190: 7178-88. 10.1128/JB.00939-08.PubMed CentralView ArticlePubMedGoogle Scholar
- Miyamoto K, Fisher D, Li J, ayeed S, Akimoto S, McClane B: Complete sequencing and diversity analysis of the enterotoxin-encoding plasmids in Clostridium perfringens type A non-food-borne human gastrointestinal disease isolates. J Bacteriol. 2006, 188: 1585-98. 10.1128/JB.188.4.1585-1598.2006.PubMed CentralView ArticlePubMedGoogle Scholar
- Schneider T, Stormo G, Gold L, Ehrenfeucht A: Information content of binding sites on nucleotide sequences. Journal of Molecular Biology. 1986, 188: 415-431. 10.1016/0022-2836(86)90165-8.View ArticlePubMedGoogle Scholar
- Visone: analysis and visualization of social networks.http://visone.info/
- Pellegrini M, Marcotte E, Thompson M, Eisenberg D, Yeates T: Assigning protein functions by comparative genome analysis: protein phylogenetic profiles. Proc Natl Acad Sci USA. 1999, 96: 4285-8. 10.1073/pnas.96.8.4285.PubMed CentralView ArticlePubMedGoogle Scholar
- Date S, Peregrin-Alvarez J: Phylogenetic profiling. Methods Mol Biol. 2008, 453: 201-16. full_text.View ArticlePubMedGoogle Scholar
- Edgar R: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-7. 10.1093/nar/gkh340.PubMed CentralView ArticlePubMedGoogle Scholar
- Kumar S, Nei M, Dudley J, Tamura K: MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinform. 2008, 9: 299-306. 10.1093/bib/bbn017.PubMed CentralView ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.