Overview of the analysis flow
The analysis flow used to reconstruct the combined transcriptional-sRNA network from the expression compendium is depicted in Figure 1. We first inferred a module network from the expression compendium (Panel A and B). A module consists of a set of genes that is co-expressed, and the conditions under which these genes are co-expressed. Because genes in a module behave similarly, we assume they might be co-regulated either at the transcriptional or post-transcriptional level. Possible TFs or sRNAs that could explain their co-expression behavior were assigned to each of the obtained modules using expression-based network inference methods that assess whether there exists a similarity in the profile of the assigned TF/sRNA and that of the genes in the module to which the TF/sRNA is assigned. Because it has been shown that network inference approaches differing in their underlying principles often give complementary predictions[12], we used a combination of two different methods (LeMoNe[13] and CLR[11]) to make our final predictions.
Expression-based inference methods cannot distinguish whether the regulators affect the modules to which they are assigned in a direct versus an indirect way, i.e. whether the assigned regulators directly interact with the target genes in the modules to affect their regulation or whether they affect another regulator which on its turn physically interacts with the targets in the module. To infer for the assigned sRNAs direct from indirect modes of regulation, we complemented the expression-based inferences with sequence-based information (Panel C): direct interactions as summarized in the sRNA-target interaction network (Panel D) were inferred by identifying genes in the module that contained a region in their sequence that was complementary to a region present in an sRNA assigned to the module (results obtained from IntaRNA[14] and TargetRNA[15, 16]).
Module inference
To infer modules, we relied on a previously developed global biclustering algorithm (ISA[17, 18]). With ISA, we identified 78 modules in our dataset of which 57 were functionally enriched. All 78 modules contained at least one predicted sRNA target (based on IntaRNA and TargetRNA predictions (see Methods)) and 21 modules contained an experimentally validated sRNA target. For several modules which showed a clear functional overrepresentation, sRNA targets within the module had a function related to the functional category assigned to the module (see below for a more detailed description of those modules). An overview of the modules is given in Additional file1: Table S1: Characteristics of module network as reconstructed by CLR and LeMoNe. 37 out of the 108 experimentally verified sRNA targets ended up in a module, while the remaining sRNA targets remained unclustered. In some cases, e.g. for OmrA, OmrB, OxyS, DsrA, GcvB targets of the same sRNA, were clustered together. For the other cases it seems that targets, despite being regulated by the same sRNA exhibit a profoundly different expression pattern (Additional file2: Table S2: overview of the sRNAs in different modules). This indicates an intricate interaction between the sRNAs and the TF-mediated transcriptional network.
Assigning a regulatory program
To map the interaction between the transcriptional and the posttranscriptional network, we reconstructed a module network by assigning to each of the modules a regulatory program that consists of a combination of sRNAs and TFs. Because a module expression profile is more robust than a single gene profile, we assigned regulatory programs to modules rather than to single genes (here referred to as module networks). To this end, we used two previously described inference tools that exploit expression information to reconstruct networks, CLR and LeMoNe[12, 19]. Both tools assume that the expression profile of the regulator is a proxy for its activity. They thus assign a regulator to a module if the expression profiles of both entities show a relation. The first method, LeMoNe[13] is inherently module-based: it assigns a regulatory program to pregrouped gene sets (or module). LeMoNe first partitions for the selected gene set in a module, the conditions according to different levels of over (under) expression (multivariate distribution). Then it assigns to each module those regulators for which the expression profiles best fits all or part of the condition partitions in the module. CLR[11], on the other hand assigns a regulatory program based on the degree of mutual information between the expression profile of a regulator and that of each possible target gene. Although initially developed as a direct inference method that assigns a regulatory program on a gene by gene basis, we apply CLR here to assign a regulatory program at module level (see Methods). As input for both CLR and LeMoNe we used the gene selection of the 78 ISA modules, but instead of only using the conditions selected in the module, we used all conditions in the compendium to perform the regulatory assignments as both CLR and LeMoNe can weigh the conditions ‘according’ to their relevance for the assignment of the regulatory program. An overview of the assignment of a regulatory program to the different modules is shown in Additional file1: Table S1.
Complementarity between CLR and LeMoNe
Both CLR and LeMoNe assign a score to the TF/sRNA module assignments. To set the threshold for CLR, we used the FDR-based strategy described in the original paper[11]. For LeMoNe we relied on a previously optimized threshold[19]. These criteria resulted in assigning on average 2–3 regulators (with a regulator being defined as a TF or sRNA) per module for CLR (159 assignments comprising 75 regulators) and 1–2 for LeMoNe, (165 assignments comprising 89 regulators). To four modules no regulators were assigned, given the defined thresholds.
Figure 2 gives an overview of the TF and sRNA assignments by either LeMoNe or CLR to the different modules (Panel A). Figure 2 panel B and C summarizes quantitatively the complementarity between LeMoNe and CLR for respectively the TFs and sRNAs. Because in principle the same regulator (TF/sRNA) can be assigned to different modules, the number of assignments is larger than the number of assigned regulators. The complementarity between both methods is therefore displayed both from the perspective of the assignments and of the assigned number of TFs.
For TFs a total of 190 assignments, covering 84 TFs were made of which 127 assignments covering 71 TFs were made by LeMoNe and 109 assignments covering 49 TFs were made by CLR. Of the total number of assignments, 46 assignments covering 26 different TFs were consistent between CLR and LeMoNe. For about 39 assignments made by either CLR or LeMoNe covering 17 different TFs, the assignments were confirmed by target enrichment analysis: that is the module to which the TF was assigned indeed was overrepresented in known targets (according to RegulonDB) of the assigned TF (Figure 2 Panel A). 66.7% of these assignments that were confirmed by target enrichment analysis can be found in the intersection of targets predicted by both methods.
For the sRNAs, a total of 71 assignments for 30 different sRNAs were made. Of the 38 assignments for 18 sRNAs made by LeMoNe and the 50 assignments for 26 sRNAs made by CLR, 17 assignments for 10 sRNAs were consistently predicted by both algorithms. For 3 cases (one assignment for MicF (predicted by CLR), one assignment for Spf (predicted by CLR), and one assignment for RyhB (predicted by LeMoNe and CLR)), the module to which the sRNA was assigned also contained at least one experimentally verified target of this assigned sRNA. 21 out of the 43 modules to which an sRNA was assigned, contained either a predicted or an experimentally verified target for the assigned sRNA, indicating that probably in the other cases the assigned sRNAs are involved in the indirect rather than the direct regulation of the genes in the module e.g. by regulating the TFs that were assigned to the module rather than by regulating the genes in the module itself.
Regarding the complementarity between LeMoNe and CLR, results show that both methods tend to make more predictions for TFs than for sRNAs which indicates that the signal in the dataset is more pronounced for TFs than for sRNAs. This is to be expected as the probes in the used array platforms were designed to measure protein coding genes (and thus TFs), but not sRNAs. sRNAs are in general represented by very few probes and not necessarily on all platforms (see Methods). Other differences between the results obtained by either method can be explained by the method’s specificities. In general, CLR performs well if the profile of the assigned regulator matches the profile of the module for a minimal number of conditions of any type (it tests a global similarity rather than a condition specific profile). LeMoNe on the other hand assigns more importance in finding a fit between the expression profile of the regulator and that of the module for those condition partitions that are homogenous. The latter can contain a large part of the conditions in the dataset (in which case the interaction would also be recovered by CLR) or it can be restricted to only a subset of the conditions (in which case the interaction would only be recovered by LeMoNe). However, LeMoNe’s ability to assign regulators that only match part of the conditions in the dataset comes at the expense of also penalizing more small mismatches between the profiles of the regulators and those of the modules for the condition partitions of importance. So even if the global profile seems to match quite well (high CLR score), such mismatches can result in low LeMoNe scores. Both properties can explain why for the same thresholds used on both TFs and sRNAs, LeMoNe tends to find more unique assignments for TFs than for sRNAs compared to CLR (for which the opposite is true): because for TFs the signal is rather robustly measured, LeMoNe can also assign regulators for which the expression profile only matches the module’s profile for a subset of the conditions, whereas CLR can not assign those regulators, explaining why LeMoNe assigns relatively more TFs than CLR. However, in the case of sRNAs, the expression signal is much less robustly measured and small mismatches in the expression profile of the regulator and that of the module become a major limitation for LeMoNe. This results in CLR being able to assign more sRNAs than LeMoNe under those conditions.
The most reliable assignments of regulators (TF or sRNA) to modules evidently consist of the predictions that were made by both LeMoNe and CLR (indicated in bold in the Additional file1: Table S1). A selection of interesting modules is described in more detail below.
sRNA-target interaction network
To derive from the sRNA-module assignments the underlying sRNA-target network, we defined as potential targets of a certain sRNA those module genes that also contained in their upstream regions a putative recognition site of the sRNA assigned to the module (based on the union of IntaRNA and TargetRNA). This resulted in 30 different sRNA-target interactions (corresponding to 33 sRNA-target assignments) comprising 14 sRNAs (out of the 72 for which we tried to make an assignment) regulating 30 genes (Figure 3).
Recently, Modi et al.[10] have also applied CLR to predict novel sRNA-target interactions for 24 sRNAs (which are all contained in the set we used in our analyses). Our approach is intrinsically quite different from the one of Modi et al.[10] in its experimental set up: whereas Modi et al.[10] assigns sRNAs to single genes, we assign sRNAs to modules (module-base inference). In addition both approaches, ours and the one of Modi et al.[10] use slightly different parameter settings for running CLR and the sequence-based sRNA-target predictions. We compared to what extent the results of our approach corresponded to those of Modi et al.[10] (Additional file3: Table S3).
Whereas we integrate sequence-based predictions with the module-based sRNA assignments to infer the direct sRNA-target interaction network (Figure 3 and Additional file3: Table S3 column f), Modi et al.[10] in their study used expression profile-based sRNA assignments to construct an sRNA-target interaction network which is thus composed of both direct and indirect targets. The original number of predictions made by Modi et al.[10] based on expression data only is shown column c of the table in the Additional file3: Table S3. To make the results of both networks more comparable, we removed from the predictions of Modi et al.[10] the indirect interactions using the sequence-based prediction approach adopted in their original paper (column d).
Results on a benchmark dataset show that both methods ours and the one of Modi et al.[10] perform poorly in recovering true benchmark interactions and have a very low sensitivity. When considering the targets of Modi et al.[10], predicted based on high scoring CLR assignments, benchmark interactions could be recovered for 5 sRNAs (Additional file3: Table S3 column e) (MicF (1 target)), GadY (1 target), GcvB (2 target), RyhB (1 targets)). Of those, three benchmark interactions (MicF, RyhB, GadY) were retained if the target was also required to contain a recognition site for the respective sRNA (column e between brackets). We had a similarly low sensitivity (column h between brackets) and could also only recover the known MicF target from the benchmark. Comparing columns d and g of Additional file3: Table S3, containing respectively the number of direct sRNA targets predicted by Modi et al.[10] and our study by combining expression and sequence data shows that both methods predict novel targets for a very different set of sRNAs. Besides for MicF and RyhB for which both our method and Modi et al.[10] can predict targets, Modi et al.[10] predicts targets for GcvB whereas we do not. In contrast, with our approach we were able to predict targets for OxyS, GadY, RyeA, PsrD, SroD, Tpke70, SroA, IsrB and C0343 that were not detected by Modi et al.[10] (the last 7 sRNAs were not analyzed by Modi et al.[10]). This indicates the complementarity between single gene and module-based approaches[12].
Description of interesting modules
An overview of the modules can be found in Additional file1: Table S1. The more detailed content of the modules, together with their regulatory program can be found at Additional file4: Module Overview.
Module 6
Module 6 (Figure 4 panel A) is a rather large module being overrepresented for genes involved in iron transport and iron-sulfur cluster assembly. Two regulators, one TF (IscR) and one sRNA (RyhB) were assigned to this module with a high reliability (as their assignment was confirmed by both LeMoNe and CLR): IscR, a sulfur-cluster containing TF, known to regulate the expression of operons that encode components of a pathway of iron-sulfur cluster assembly, iron-sulfur proteins, anaerobic respiration enzymes and biofilm formation[20, 21]. Module 6 contains two known targets being regulated by IscR (the operons nrdHIEF and sufABCDES, involved in iron-sulfur cluster assembly[22]). Interestingly, the regulator IscR belongs to a polycistronic mRNA iscRSUA known to be regulated by the sRNA RyhB which was also assigned to this module.
This assignment of RyhB was further confirmed by that fact that the module contained one known target of RyhB (shiA)[23]. In addition to this known target we also have one predicted RyhB target in the module, sufB (Figure 4 panel B), which is a component of the SufBC2D Fe-S cluster assembly scaffold complex[24] that is responsible for the synthesis of Fe-S clusters[24–26].
Down-regulation of proteins involved in assembly of Fe-S clusters by RyhB would make sense given the known function of RyhB during Fe homeostasis: RyhB is known to reduce iron consumption under low-iron conditions by downregulating expression of iron-containing proteins[27–29] and the Fur regulon, of which also several genes are present in the module and which is known to have a central role in iron metabolism. In addition the direct binding of RyhB to sufB seems likely as the interacting region identified in RyhB is located in an unstructured region and overlaps with the binding region of previously detected targets (see Additional file1: Table S1).
Interestingly, both the ISC assembly system (iscRSUA operon), which is responsible for Fe-S cluster production under normal conditions and the SUF assembly system (sufABCDES operon) responsible for Fe-S cluster production under oxidative stress conditions are encoded by polycistronic operons[30]. The polycistronic iscRSUA mRNA to which also IscR, the regulator assigned to the module belongs, is known to be processed by RyhB by inducing a cleavage of the operonic transcript between iscR and iscSUA[31]. Our predicted interaction between RyhB and sufB, the second gene of the sufABCDES operon suggests that RyhB would also interact with the sufABCDES operon through an intraoperonic regulation mechanism[31].
Module 17
Module 17 contains 31 genes most of which relate to membrane encoded transport systems. The module was predicted to be regulated by the TFs MalT, of which also the targets were found to be enriched in this module, CueR and YgiV. For the latter TFs no known targets were found in the module. For the sRNA MicF assigned to this module, one of its known targets, ompF, belongs to the module[9]. Although the module does not contain any other predicted targets of MicF, we found evidence of a MicF recognition site upstream the coding region of the ygiV, a TF also assigned to the module. The recognition size is located in the region from -102 to -52 bp upstream of ygiV, a region that comprises the short intergenic region between ygiW and ygiV, and the C terminal end of the coding region of ygiW. The region on MicF, predicted to bind with ygiV, although being quite large partially overlaps with the unstructured 5′end of MicF to which also the known MicF target OmpF binds (see Additional file4: Modules overview). YgiV is known as a repressor of McbR (also known as YncC), a regulator of biofilm formation[32].
Also interesting is the assignment of the sRNA IsrB with unknown function to the same module. IsrB in E. coli has no documented targets yet, but its genomic location overlaps with the coding regions of azu-genes, a set of inner membrane encoding genes with unknown function. A link between IsrB and membrane encoded functions is plausible viewing the large subset of membrane related functionalities in this module. However, relying on our sequence-based sRNA target prediction, no direct target of IsrB was found to be present in this module so IsrB could be involved in the indirect regulation of this module (e.g. by regulating other regulators that on their turn regulate the genes in the module).
Module 58
Module 58, mainly expressed under stationary growth contains genes involved in acid response, amino acid starvation (purine salvage, amino acid uptake) and induction of microaerobiosis (represented by Ecocyc enrichment analysis). Three regulators were assigned to Module 58, GadE, CueR and the sRNA GadY, all of which are known targets of RpoS[33]. Module 58 was also found to be enriched in direct targets of GadE (see Additional file1: Table S1), indicating that the assignment of GadE as a regulator to module 58 is true. GadE, the central activator of the acid response system controls genes involved in the maintenance of pH homeostasis through its direct targets involved in the glutamate-dependent acid resistance system (here represented by gadA and gadBC genes) and is involved in multidrug efflux (mdtE, mdtF) through controlling the expression of the TFs, GadW and GadX both related to acid resistance (of which only GadW was found in the module)[34]).
The sRNA GadY which is highly expressed during entry into stationary phase and regulated by low pH[35] is related to the GadE dependent acid response through an intricate network of interactions with GadW (also in module 58) and GadX (according to Regulon DB).
A last regulator assigned to module 58 was CueR “Cu efflux regulator”, which was also predicted to be a target of GadY using our sequence-based predictions. CueR, regulates genes related to the primary copper homeostasis system in response to the presence of copper, silver, or gold ions[36]. None of the known CueR targets related to its function in Cu2+ homeostasis were found in module 58. However, CueR being a target of GadY and also being assigned as a regulator to module 58 points towards a connection between Cu2+ and pH homeostasis, a link that has been suggested before. Yamamoto et al., for instance, showed that pH changes affect the genome-wide transcription pattern of copper-balance genes in the presence of CuSO4[37].
Besides CueR, module 58 contained three additional predicted targets of GadY (assuming that genes belonging to a module with an assigned sRNA as regulator that also contain a recognition sequence of that sRNA in their upstream region are direct targets of the sRNA). A first one, CbpA has a functionality related to the one of DnaJ and functions as a co-chaperone with DnaK. A second one, PoxB is pyruvate oxidase and the last one XdhA-XdhB-XdhC is a putative heterotrimericxanthine dehydrogenase[38]. How their functionalities link to the role of GadY is less clear. GadY is known to be an antisense binding sRNA that acts on its cis encoded target GadX. So far GadX is the only characterized target of GadY. However, it cannot yet be excluded that GadY would have additional targets encoded elsewhere on the genome[39], the more because GadY has been shown to share the Hfq binding property of transacting sRNAs[33]. The fact that the regions to which GadY would bind in its predicted targets CueR, DnaJ and PoxB are located quite far upstream of their respective annotated TSSs could explain why such non-conventional targets have largely been overlooked by computational predictions.
Module 8
Module 8 contains pathways which relate to oxidative membrane stress (osmotic stress response, efflux pumps, membrane remodeling). Three regulators have been assigned to this module by LeMoNe: MarA, Fur and OxyS. MarA, is a “multiple antibiotic resistance” regulator of which indeed part of its known regulon was found in the module. MarA is an outer membrane porin involved in the efflux of several hydrophobic and amphipathic molecules and is known to be involved in resistance to antibiotics and oxidative stress[40]. The module indeed contains MarA targets such as TolC, an outer membrane porin. Although the Fur regulon members are not well represented in this module, the autoregulated TF Fur has not only been assigned to the module, but also belongs to the module itself, further supporting its assignment. Besides its well documented role in iron homeostasis, Fur is also known to be involved in oxidative stress responses by downregulating iron uptake systems[41].
Next to these TFs also the sRNA OxyS known to play a regulatory role in the oxidative stress response[42] was assigned to this module. Three targets regulated by OxyS were predicted with our approach and were found in module 8, implying that OxyS regulates together with MarA the genes in module 8: RimK, a ribosomal protein S6 modification protein belonging to the ybjC-nfsA-rimK-ybjN operon, an operon which indeed is known to be regulated by (Rob/MarA/SoxS) and OxyR. So, the additional regulation of rimK (intraoperonic promotor site) by OxyS is plausible. InaA, a second predicted target of OxyS present in module 8 is pH-inducible protein involved in stress response[43, 44]. A third target of OxyS which we could predict is mltC, a membrane-bound lytic murein transglycosylase C, known to be induced by oxidative stress via SoxS[45].
Module 20
Module 20 contains genes belonging to pathways involved in transport, oxidative stress response (Mar and Sox operons) and gluconate, ascorbate utilization. SoxR which was assigned to the module is also part of the module. The regulator IclR also assigned to the module is known to regulate the glyoxylate bypass operon[46, 47].
According to our predictions, the sRNA assigned to this module RyfA, which has no assigned function yet would have one predicted target in the module, i.e. ZnuC, the ATP-binding component of an ABC transporter involved in high-affinity zinc uptake (ZnuABC). znuC transcripts were shown to disappear or markedly decreased at 5 min after zinc addition[37]. Such quick induction or repression of the zinc-responsive genes upon increasing environmental zinc levels suggests a regulation mechanism mediated by sRNAs. Some of the enzymes being expressed in the module are indeed known to depend upon a Zn2+ containing active site (e.g. UlaE[48]). In literature we found an indirect relation between SoxS and RyfA through the regulation of the predicted target ZnuC. SoxS is known to increase the expression of the zinc uptake system ZnuACB in E. coli, although no direct binding of SoxS to the promoter of znuACB has been observed[49].
Module 61
Module 61 (44 genes) contains genes related to oxidation-reduction, electron transport and energy generation. Three TFs, AdiY and YahA were assigned to this module by LeMoNe and one CdaR by CLR. To our knowledge the role of YahA, a c-di-GMP-specific phosphodiesterase is yet unknown (YahA contains an EAL domain close to an N-terminal putative DNA-binding domain). AdiY was previously shown to be strongly upregulated after a rapid decrease in external pH[50]. Its known target, the arginine decarboxylase system (adi) is known to be induced in rich medium, under anaerobic conditions, and at low pH[50, 51] conditions under which genes present in the module are also known to be expressed. CdaR, regulates genes involved in the uptake and metabolism of galactarate and glucarate and is also found to be one of the regulators for which the targets are enriched in the module. Besides these TFs also Tpke70 a sRNA of approximately 40 nt in length with yet unknown function was assigned to this module[52]. Module 61 also contains two predicted targets of Tpke70 that is NapG and NapD (predicted using sequence-based methods) both parts of the periplasmic nitrate reductase system in E. coli[53–55].