Binding site of restriction-modification system controller protein in Mollicutes
© The Author(s). 2017
Received: 19 October 2016
Accepted: 17 January 2017
Published: 31 January 2017
Bacteria of the class Mollicutes underwent extreme reduction of genomes and gene expression control systems. Only a few regulators are known to date. In this work, we describe a novel group of transcriptional regulators that are distributed within different Mollicutes and control the expression of restriction-modification systems (RM-systems).
We performed cross-species search of putative regulators of RM-systems (C-proteins) and respective binding sites in Mollicutes. We identified a set of novel putative C-protein binding motifs distributed within Mollicutes. We studied the most frequent motif and respective C-protein on the model of Mycoplasma gallisepticum S6. We confirmed our prediction and identified key nucleotides important for C-protein binding. Further we identified novel target promoters of C-protein in M. gallisepticum.
We found that C-protein of M. gallisepticum binds predicted conserved direct repeats of the (GTGTTAN5)2 motif. Apart from its own operon promoter, HsdC can bind to the promoters of the clpB chaperone gene and a tRNA cluster.
Mollicutes are wall-less bacteria with a significantly reduced genome. The repertoire of gene expression regulators in Mollicutes is reduced as well. The smallest number of transcription factors is observed within Mycoplasmatales . However, mycoplasmas are efficient parasites that colonize numerous vertebrate hosts. The ability of mycoplasmas to adapt to different conditions contrasts with the small amount of regulators. Mycoplasmas are also used as model objects for systems biology to study the core organization of living cells. Thus, the study of transcriptional regulation in mycoplasmas contributes to two topics: the adaptation of an efficient parasite using a minimal amount of regulators and the organization of gene expression regulation in the core cellular machinery.
Restriction-modification systems are widespread in bacteria. They have two enzymatic activities: site-specific DNA methylase and site-specific endonuclease. Generally, RM-systems modify hemimethylated DNA and cleave unmethylated DNA. The majority of RM-systems belong to type I or type II. While all RM-systems consist of restriction (R) and modification (M) subunits, type I systems feature an additional specificity (S) subunit as a separate protein [2, 3]. Type I RM-systems work as multimeric complexes of 2 M-1S-2R composition and recognize asymmetric sites . Enzymes of type II RM-systems work separately and recognize short palindromes. Particular members of RM-systems family may exhibit different functions, including defense from exogenous DNA such as the DNA of bacteriophages , control of DNA replication  and being a selfish element . RM-systems may also have specific transcriptional regulators [7, 8]. In this work, we studied the transcriptional regulators of restriction-modification systems (RM-systems) across the Mollicutes. We used Mycoplasma gallisepticum as a model organism to study the binding properties of the respective regulator.
A set of various RM-systems, predominantly of type II, feature specific transcriptional regulators termed controller proteins or C-proteins [7, 8], which may serve as transcriptional repressors or activators depending on the particular protein and condition [8, 9]. The binding site of C-proteins, termed the C-box, is conserved across different bacteria [10, 11]. It consists of two inverted repeats with an AGTC consensus core element. The type of C-protein binding to the C-box may govern its role as a repressor or an activator [7, 10]. The molecular mass of C-proteins is very low, and they seem to have no additional domains except a solely DNA-binding helix-turn-helix (HTH) domain. The mode of regulation depends on the protein’s synthesis and degradation speed and a feedback loop with its own promoter rather than external stimuli [8, 9]. In the RM-systems studied to date, C-protein forms an operon with the restriction subunit but not the modification subunit [10, 11]. This configuration assists the attenuation of restriction subunit concentration, which makes the methylation of genomic DNA a preferred process over its cleavage. In the current work, we used Mycoplasma gallisepticum S6 as a model to study the DNA-binding properties of its C-protein homolog, further termed HsdC (GCW_02350) because in this bacterium it resides within the hsd operon (GCW_02350-GCW_02365). As a result, we identified a novel C-protein binding site.
Cloning and purification of HsdC protein
Cloning and purification procedures were performed as described in . The HsdC (GCW_02350) coding sequence was amplified from the genomic DNA of M. gallisepticum S6 (forward primer: ATTAGGATCCATGTTTGATTATGCAAAGAAAATTA, reverse primer: TATAGTCGACATCATCTAATTTCATGCCAATCT, sequences for cloning are underlined). The amplicon was cloned into the pETm plasmid with C-terminal His-tag as described previously . HsdC protein was produced in E. coli BL21-Gold (DE3) cells. Cells were grown overnight, harvested by centrifugation, washed in PBS and lysed with Branson 250 Sonifier (Branson) at 22 kHz for 10 min. The lysate was diluted with sample buffer (final concentrations of 20 mM Na2HPO4, 10 mM imidazole, 500 mM NaCl, pH 7.5). The protein was purified on a Tricorn 5/50 column (GE Healthcare) with Ni Sepharose High Performance (GE Healthcare) resin using the AKTA FPLC system (GE Healthcare). After the application of lysate, the column was washed with 25-ml aliquots of sample buffer, then with wash buffer (20 mM Na2HPO4, 25 mM imidazole, 500 mM NaCl, pH 7.5) and finally with elution buffer (20 mM Na2HPO4, 500 mM imidazole, 500 mM NaCl, pH 7.5). After elution, the protein was 60-fold diluted with 20 mM Tris-HCl buffer, pH 7.5 to 20 pmol/μl and directly used for electrophoretic mobility shift assay (EMSA).
Electrophoretic mobility shift assay
Oligonucleotides used for EMSA experiments (only plus strand is shown)
P hsdC -WT
P hsdC -mut1
P hsdC -mut2
P hsdC -mut3
P hsdC -mut4
P hsdC -mut5
P hsdC -mut6
P hsdC -mut7
P hsdC -mut8
P hsdC -mut9
The HsdC binding constant was determined from the titration curve with a series of protein dilutions as described previously . Briefly, the equilibrium equation was solved for the DNA-protein complex concentration to obtain the equation for the fractional saturation of DNA, which was measured in EMSA experiments (equation 2 in ). Then, the binding constant was determined by nonlinear regression of the experimental data to a theoretical curve by the least squares method.
Whole-genome mapping and quantification of transcription start sites
The data on promoters’ position and activity was taken from our previous work . Briefly total RNA was extracted from the cells by TrizolLS (Life Technologies) reagent, depleted with tRNAs using PureLink RNA mini spin columns (Life technologies) and fragmented by ZnSO4 treatment. Fragments were end-repaired by T4 polynucleotide kinase and treated with Terminator (Epicentre) 5′-phosphate dependent exonuclease. This procedure resulted in degradation of all but primary 5′-end fragements. Than fragments were treated by tobacco acid phosphatase (Epicentre) and ligated into adapters for RNA-seq. After cDNA synthesis and amplification cDNA libraries were subjected to normalization (removal of cDNA of rRNAs) by double-strand specific nuclease DSN (Eurogen) as described in . Libraries were sequenced on SOLiD 4 platform. Sequence coverage of 5′-end enriched libraries formed sharp peaks on the positions of transcription start sites. Peaks were picked using the algorithm described in . Peak coverage corresponded to the promoter activity as it was demonstrated in .
Identification of HsdC homologs and RM-system components homologs in Mollicutes
Search for HsdC (C-protein) homologs was performed by NCBI blastp algorithm. Search for homologs of RM-systems proteins was performed using domain annotation from NCBI CDD database. Putative C-protein binding sites were identified by alignment of upstream regions of HsdC homologs in different species of Mollicutes.
The raw data on transcription start sites (TSS’s) mapping was uploaded to NCBI SRA database under project id PRJNA325091.
Distribution and conservation of C-protein homologs in Mollicutes
Distribution of controller protein (HsdC) homologs and their predicted binding sites in Mollicutes. HsdC binding sites are underlined
Type II fusiona
Type II fusiona
Type II fusiona
Type II fusiona
Cross-species analysis of the respective promoters revealed putative HsdC binding motifs (Fig. 1). To enhance the identification of promoters in Mollicutes, we used previous data on the promoter structure of M. gallisepticum . The binding site of the well-studied C-proteins or C-box consists of at least two inverted repeats with a GACT core element, which may have extensions in some cases [10, 11]. Cross-species conservation analysis revealed several types of repeats in the promoters of operons coding for RM-systems components (Table 1). The identified motifs were not predicted in previous large-scale cross-species analysis of putative binding sites of C-proteins . All studied Acholeplasmatales featured direct repeats of the (AACGAATN12)3 sequence, although the spacer length varied by 1 nt in some occasions. However, at least two repeats featured a conserved 12 nt spacer. In mycoplasmas, we observed two major variants of the motif. One variant consists of a GTGTTA core sequence forming either direct repeats (GTGTTAN5)2 or inverted repeats. The latter variant was found in M. conjunctivae and in M. mobile. Another type of motif is completely different and consists of inverted repeats, GGACN5GTCC. This motif is similar to the well-characterized C-box of the AhdI RM-system AGTCCN2GGACT , but with the reverse order of repeats.
HsdC recognizes direct repeats in the promoter of hsd operon
To test HsdC activity in vivo, we used a previously described overexpression vector  with cloned hsdC ORF. However, all attempts produced a lethal phenotype. The quantitative data on hsdC promoter activity  support its role as a transcriptional repressor. The corresponding promoter features strong determinants including a consensus −10 box, extension and initiator nucleotide (consensus: TRTGNTAWAATN6R, hsdC promoter: TGTGTTAAAATN6A). The activity of the hsdC promoter (measured as coverage, see ) was approximately 2 orders of magnitude lower than the average activity of a promoter with the given sequence (Additional file 4: Figure S3).
Additional targets of HsdC in M. gallisepticum genome
The HsdC binding site resembles the core binding site of the MraZ transcriptional regulator (AAAGTGKN3)3, K = G or T . However, the spacer between the GTG core repeats in the motifs differs by 1 nt. We tested each protein for binding to the sites of the other (Fig. 2b, lanes PmraZ HsdC – PhsdC MraZ). MraZ protein is capable of binding to the HsdC motif with comparable strength as to its own (as a single octamer), while HsdC cannot bind to the MraZ motif. The MraZ-overexpressing strains obtained in our previous work  demonstrate no effect on the hsd operon in vivo (data not shown).
We identified a novel subfamily of transcriptional regulators of RM-systems, which are distributed within Mollicutes. The majority of them recognize motifs that are completely different from the known C-box in sequence and structure. There are three types of C-protein binding motifs in Mollicutes, and the most frequent ones consist of direct rather than inverted repeats. The recognition of direct rather than inverted repeats suggests an alternative protein dimerization mechanism as well. The exact role of HsdC in the control of the hsd operon is unclear. Extensively studied C-proteins of the AhdI  and PvuII  RM-systems serve as activators when binding upstream to the −35 region. The C-protein of the AhdI RM-system may also serve as a repressor if binds between the −35 and −10 regions, physically hampering RNA-polymerase binding. At the same time, both RM-systems consist of two transcription units, while the C-protein controls only the R-subunit gene. This configuration causes the S and M subunits to be expressed first and modify genomic DNA before the R subunit can cleave it. HsdC of M. gallisepticum seems to act as a repressor of the whole operon of S, M and R subunits, and no other promoters were identified in the vicinity of the HsdC binding site. It is a question if HsdC functions only as auto-repressor of the hsd operon or is regulated on the post-translational level. In the first case it may produce burst-like expression rather than steady one.
Restriction-methylation systems are widespread in Mollicutes, but only a few of them are controlled by a transcription factor (Additional file 1: Table S1). The M. gallisepticum S6 strain used in this work is currently the only strain of M. gallisepticum that features an RM-system with a transcriptional regulator. At the same time, all strains including S6 have another RM-system of type I that lacks a controller subunit. This observation as well as the duplication of the genomic region around the controlled RM-system indicates horizontal transfer of the whole operon (Additional file 5: Figure S4). RM-systems resemble toxin-antitoxin systems in terms of their selfish behavior . However, our experiments demonstrate that HsdC can bind to promoters of important genes, including clpB and the tRNA gene cluster. This finding may explain the lethal phenotype of HsdC-overexpressing strains. While the functions of ClpB could be performed by other chaperones, including DnaK and the GroEL complex, only one copy each of tRNA-Asp and tRNA-Phe exist in the genome, and their function suppression cannot be compensated.
The binding site of HsdC in the clpB promoter overlaps with the binding site of its specific regulator HrcA, which likely leads to competition between the two proteins for the promoter. This effect at least introduces a novel mode to the regulation of clpB. However, even if the effect of HsdC competence is negative, it is likely that the elimination of its binding site by mutation would produce even more negative consequences. In the case of the clpB promoter, the core TG dinucleotides of the HsdC binding site are formed by −10 box extension and a core element of CIRCE (Controlling Inverted Repeats of Chaperone Expression), the binding site of HrcA repressor(Fig. 3a). Thus, mutation would either decrease the promoter strength or hamper HrcA-dependent regulation. Measurement of clpB promoter activity indicate relatively high expression level in comparison to hsd operon promoter (Additional file 6: Table S2). Probably it is a result of HsdC displacement by HrcA, the designated regulator of this promoter. In the case of trnM, the HsdC binding site could be eliminated by a mutation in first repeat, where substitution of G within the −10 box would not impede the promoter activity (Fig. 3a).
HsdC provides an example of an interesting evolutionary event: the acquisition and domestication of a foreign regulator. The promoter sequences of the clpB and trnM genes are identical in the M. gallisepticum S6 and R(low) strains. This observation means that the ability to bind a novel regulator was acquired previously by chance. In the case of the clpB gene, the HsdC binding site was generated by an extended −10 box and a part of CIRCE, its dedicated regulator. This phenomenon may be considered as an event of exaptation , where adaptation to a certain type of promoter regulation leads to susceptibility to another one, which can potentially be acquired in the future.
Mollicutes feature homologs of the controller protein (C-protein) of RM-systems that regulate the transcription of both type I and type II RM-systems. In some cases, they form fusions with the corresponding methylase. There are three types of C-protein binding sites: one is characteristic of Acholeplasmas, (AACGAATN12)3, and the others of mycoplasmas, GGACN5GTCC and (GTGTTAN5)2. The latter (the most frequent) was confirmed experimentally. The central TG dinucleotide was the most conserved and the most important for binding. Since the binding site of this type is relatively simple, the C-protein may bind other promoters, basically ones with a TRTG extension of the −10 box.
The work was funded by the Russian Science Foundation grant 14-24-00159, “Systems research of minimal cell on a Mycoplasma gallisepticum model”.
Availability of data and materials
The raw data on transcription start sites (TSS’s) mapping was uploaded to NCBI SRA database under project id PRJNA325091.
GF – designed the experiment, drafted the article. DE – performed EMSA studies. VM – obtained HsdC protein. VG – drafted the article. All authors have read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Mazin PV, Fisunov GY, Gorbachev AY, Kapitskaya KY, Altukhov IA, Semashko TA, et al. Transcriptome analysis reveals novel regulatory mechanisms in a genome-reduced bacterium. Nucleic Acids Res. 2014;42:13254–68.View ArticlePubMedPubMed CentralGoogle Scholar
- Roberts RJ, Belfort M, Bestor T, Bhagwat AS, Bickle TA, Bitinaite J, et al. A nomenclature for restriction enzymes, DNA methyltransferases, homing endonucleases and their genes. Nucleic Acids Res. 2003;31:1805–12.View ArticlePubMedPubMed CentralGoogle Scholar
- Murray NE, Type I. Restriction Systems: Sophisticated Molecular Machines (a Legacy of Bertani and Weigle). Microbiol Mol Biol Rev. 2000;64:412–34.View ArticlePubMedPubMed CentralGoogle Scholar
- Labrie SJ, Samson JE, Moineau S. Bacteriophage resistance mechanisms. Nat Rev Microbiol Nature Publishing Group. 2010;8:317–27.View ArticleGoogle Scholar
- Ishikawa K, Handa N, Kobayashi I. Cleavage of a model DNA replication fork by a Type I restriction endonuclease. Nucleic Acids Res. 2009;37:3531–44.View ArticlePubMedPubMed CentralGoogle Scholar
- Mruk I, Kobayashi I. To be or not to be: Regulation of restriction-modification systems and other toxin-antitoxin systems. Nucleic Acids Res. 2014;42:70–86.View ArticlePubMedGoogle Scholar
- Tao T, Bourne JC, Blumenthal RM. A family of regulatory genes associated with type II restriction-modification systems. J Bacteriol. 1991;173:1367–75.View ArticlePubMedPubMed CentralGoogle Scholar
- Mruk I, Blumenthal RM. Tuning the relative affinities for activating and repressing operators of a temporally regulated restriction-modification system. Nucleic Acids Res. 2009;37:983–98.View ArticlePubMedPubMed CentralGoogle Scholar
- Bogdanova E, Djordjevic M, Papapanagiotou I, Heyduk T, Kneale G, Severinov K. Transcription regulation of the type II restriction-modification system AhdI. Nucleic Acids Res. 2008;36:1429–42.View ArticlePubMedPubMed CentralGoogle Scholar
- Knowle D, Lintner RE, Touma YM, Blumenthal RM. Nature of the Promoter Activated by C . PvuII , an Unusual Regulatory Protein Conserved among Restriction-Modification Systems Nature of the Promoter Activated by C . PvuII , an Unusual Regulatory Protein Conserved among Restriction-Modification Systems. J Bacteriol. 2005;187:488–97.View ArticlePubMedPubMed CentralGoogle Scholar
- Streeter SD, Papapanagiotou I, McGeehan JE, Kneale GG. DNA footprinting and biophysical characterization of the controller protein C.Ahdl suggests the basis of a genetic switch. Nucleic Acids Res. 2004;32:6445–53.View ArticlePubMedPubMed CentralGoogle Scholar
- Fisunov GY, Evsyutina DV, Semashko TA, Arzamasov AA, Manuvera VA, Letarov AV, et al. Binding site of MraZ transcription factor in Mollicutes. Biochimie. 2016;125:59–65.View ArticlePubMedGoogle Scholar
- Sorokin V, Severinov K, Gelfand MS. Large-scale identification and analysis of C-proteins. Methods Mol Biol. 2010;674:269–82.View ArticlePubMedGoogle Scholar
- Blount ZD, Barrick JE, Davidson CJ, Lenski RE. Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature [Internet]. Nature Publishing Group. 2012;489:513–8.Google Scholar