Skip to main content

Diversity and prevalence of ANTAR RNAs across actinobacteria

Abstract

Background

Computational approaches are often used to predict regulatory RNAs in bacteria, but their success is limited to RNAs that are highly conserved across phyla, in sequence and structure. The ANTAR regulatory system consists of a family of RNAs (the ANTAR-target RNAs) that selectively recruit ANTAR proteins. This protein-RNA complex together regulates genes at the level of translation or transcriptional elongation. Despite the widespread distribution of ANTAR proteins in bacteria, their target RNAs haven’t been identified in certain bacterial phyla such as actinobacteria.

Results

Here, by using a computational search model that is tuned to actinobacterial genomes, we comprehensively identify ANTAR-target RNAs in actinobacteria. These RNA motifs lie in select transcripts, often overlapping with the ribosome binding site or start codon, to regulate translation. Transcripts harboring ANTAR-target RNAs majorly encode proteins involved in the transport and metabolism of cellular metabolites like sugars, amino acids and ions; or encode transcription factors that in turn regulate diverse genes.

Conclusion

In this report, we substantially diversify and expand the family of ANTAR RNAs across bacteria. These findings now provide a starting point to investigate the actinobacterial processes that are regulated by ANTAR.

Background

Actinobacteria is a ubiquitous bacterial phylum, widely distributed across terrestrial and aquatic ecosystems [1]. The phylum consists of very diverse bacteria, ranging from defensive mutualists dwelling in varied habitats to gastrointestinal commensals that provide beneficial properties to their host. They are also the largest source of novel natural antibiotics, enzymes and secondary metabolites. In addition to their immense environmental and industrial impact, this phylum also consists of pathogens such as species from Corynebacterium, Nocardia, Mycobacterium and Rhodococcus, which cause disease in humans, animals and plants [2].

The diversity of environmental niches seen within the actinobacteria phylum argues for diverse mechanisms of gene regulation that would allow an efficient response to environmental changes. While a body of literature now places non-coding RNAs and RNA-protein based mechanisms as a major mode of gene-regulation in several model bacteria (reviewed in [3,4,5,6,7,8,9]), our knowledge of RNA-based regulatory mechanisms in actinobacteria remains limited ([10,11,12,13,14], and reviewed in [15, 16]).

One approach to identifying regulatory RNAs in actinobacteria, has been using deep sequencing of the transcriptome coupled with 5′-RACE mapping, to identify potential RNAs that map to the untranslated regions (UTRs). These RNAs are then subjected to structure prediction tools [17,18,19] and compared against known RNA families to confirm the presence of regulatory RNAs. This approach in Corynebacterium and Streptomyces under exponential growth conditions has led to the identification of new regulatory RNAs such as the 6C RNA family, 6S RNA family, T-box leader element, novel sRNAs and trans-encoded RNAs [11, 20]. In addition this approach has helped to identify several known metabolite-responsive RNAs such as Mn2+ sensing riboswitches (yybP-ykoY), thiamine pyrophosphate (TPP) riboswitches, flavin mononucleotide (FMN) riboswitches, S-adenosyl methionine (SAM)-dependent riboswitches and cobalamin riboswitches (binds to adenosylcobalamin) [11, 20]. A similar approach led to the discovery of 75 novel small RNAs in Rhodococcus sp. when grown in glucose and pyrene as sole carbon sources, a small fraction of which have now been assigned functions [12]. Such an approach requires cells to be grown under specific conditions of interest, and do not identify the repertoire of RNAs that the cell can produce in response to unknown signals and cues.

Computational methods have also been successfully employed, to identify regulatory RNAs in actinobacteria. In one study, homologs of genes were first identified and their upstream intergenic regions were aligned and searched for patterns/ motifs using RNA secondary structure prediction tools such as RNA-pattern [18] and PAT (A.V.Seliverstov, unpublished). This led to the identification of LEU element [10], T-box [10] and B12 [21] riboswitches in several actinobacteria. More generally, the RNA family database (Rfam) employs covariance analysis, wherein bacterial genome sequences are scanned for conserved base-pairing patterns, to identify structurally conserved RNA families in the genome. Based on this, the Rfam database suggests the presence of ~ 90 cis-regulatory RNA families in one or more actinobacteria (Rfam v14.2). While these approaches have identified RNAs in actinobacteria, they are mostly limited to RNA families that are highly conserved in sequence and structure, where homologs from different bacterial phyla closely resemble each other.

For some RNA families, the highly GC rich actinobacterial genomes may result in RNA sequences that are diverged from their firmicute or proteobacterial homologs, and hence not easily identified through routine sequence based or structure based searches. One such example is the 6S RNA family, which could only be identified in actinobacteria using a clustering method wherein the sub-optimal RNA structures were used to find functionally relevant motifs [13]. Known 6S RNAs from related bacterial species of proteobacteria, firmicutes and cyanobacteria were analyzed for similarity based on sequence and minimum free energy (MFE) structures. Despite a common function, these RNAs lack sequence and structure similarity. Instead of MFE structure when sub-optimal structures were analyzed, these RNAs fell into different clusters, 3 of which represented most of the 6S RNAs. Information from these 3 clusters was used to identify 6S RNAs across genomes. Through this clustering method, several 6S RNAs were obtained in Mycobacteria and Streptomyces species, representative of actinobacteria.

We observed a similar discrepancy in an important family of RNAs known to be targets of the ANTAR RNA-binding protein. RNAs bound by ANTAR proteins are conserved in structure and are widespread among firmicutes and proteobacteria [22,23,24,25,26,27]. In actinobacteria, however, despite the widespread presence of ANTAR protein domains (Pfam: PF03861), their target RNAs remained unidentified. Only recently, in a study focusing on Mycobacteria, these RNAs were identified using a genome-wide covariance search approach combined with clustering [28]. A search model (structure based sequence alignment) enriched in firmicute and proteobacterial RNAs showed very high sequence and structure similarity and as a consequence failed to predict RNAs in actinobacteria. When diverse RNAs from different firmicutes and proteobacteria were added to the search model, they separated into several clusters based on sequence and structure similarity. This clustering resulted in a search model that successfully identified RNAs in Mycobacteria by removing the bias imposed by highly similar or highly dissimilar RNAs. Notably, neither the firmicute [27] nor the mycobacterial search models [28] were effective in finding ANTAR RNAs across the actinobacterial phylum.

Here, we identify the repertoire of ANTAR-target RNAs across actinobacteria. To identify these RNAs we first developed an actinobacteria-centric search model which when used to search against all actinobacterial genomes, successfully identified ANTAR-target RNAs. We find that the family of ANTAR-target RNAs is present across all actinobacteria and co-occurs with ANTAR proteins. There are only a few examples of bacteria where despite the presence of ANTAR proteins, we are unable to identify RNA targets. These RNAs resemble ‘cis’ regulatory RNAs in their genomic locations, typically residing in the untranslated region (UTR) or near the start of a coding region. COG (Cluster of Orthologous Genes) database is a tool to functionally annotate protein sequences based on homology to known protein sequences. COG analysis of the genes distal to ANTAR-target RNAs reveals that these RNAs are associated with transport and metabolism of small molecule metabolites, ranging from amino acids to metal ions to diverse sugar substrates. Additionally, ANTAR-target RNAs also appear linked to genes encoding transcription factors that are known to modulate the expression of several transporters. Our study underlines the presence of the ANTAR protein-RNA regulatory system in actinobacteria, and its importance in governing the uptake and metabolism of a variety of nutrients. This approach of scanning an existing RNA family for sequence diversity and using that to find homologs in distant phyla may be broadly applicable to other RNA families.

Results

Identifying ANTAR-target RNAs across phylum actinobacteria

Analysis of the previously reported ANTAR RNAs revealed that ~ 400 ANTAR-target RNAs are known in firmicutes and proteobacteria [27], and they are conserved in secondary structure with dual stem loops separated by a linker (Fig. 1A). Each stem possesses a hexanucleotide loop where the first and fourth positions are conserved in sequence as an adenine (A1) and guanine (G4) respectively (Fig. 1A). More recently, in a study focusing on ANTAR RNAs in Mycobacteria, a covariance-based computational approach was used to search for ANTAR RNAs. Here it was shown that a focused search model (a set of RNAs aligned based on similar secondary structure and sequence) consisting of highly similar firmicute/proteobacterial RNAs was unable to predict RNAs in Mycobacteria. This is likely due to a divergence of mycobacterial ANTAR RNAs from their firmicute/proteobacterial homologs. Only when the search model was modified to include more diversity that expands the sequence space (partially focused search model), was the search capable of finding RNAs in Mycobacteria. This resulted in ~ 90 ANTAR-target RNAs identified across all mycobacterial species [28].

Fig. 1
figure1

Improvised search model to predict ANTAR-target RNAs in actinobacteria. A Cartoon showing the ANTAR protein-RNA regulatory system. Specific signals activate the ANTAR protein (grey), which upon activation binds the dual stem loop ANTAR-target RNA (blue). This results in regulation of the downstream gene (gene linked to ANTAR-target RNA, shown in purple). B Schematic shows the steps performed to identify ANTAR-target RNAs using a covariance-based computational search. Previously reported search models with too little diversity (focused) did not yield any results in actinobacteria, while a search model with only moderate diversity (partially focused) identified ~ 243 RNAs in actinobacteria, with a bit score threshold≥14. 30 actinobacterial representative RNAs from this set were used to enrich the search model further and this actionbacteria centric search model (diffused search model) resulted in a comprehensive list of ANTAR-target RNAs in actinobacteria. The probability of finding RNAs in actinobacteria is represented as a bar (red indicates high probability). C Bar plot (left) shows the total number of actinobacterial genomes where RNAs are predicted using three different search models (purple, gray and green). Bar plot (right) shows the total number of RNAs predicted using three different search models. The diffused search model is able to predict RNAs in more than 60% of actinobacterial genomes as compared to the focused and partially focused search model. D RNA sets from firmicutes/ proteobacteria and actinobacteria were clustered using cmbuild. Bar plot shows the number of clusters obtained with varying sequence identity cut-offs imposed using cmbuild. Clusters obtained using 51–54% sequence identity cut-off are shown as an inset. E Consensus structure obtained for the actinobacterial ANTAR-target RNA sequences from the largest cluster with 55% sequence identity is visualized using Forna (Left). Stems (green) while the internal loops (blue) and the unpaired nucleotides (pink) are shown. Parameters obtained from RNAz for the largest cluster with 55% sequence identity are shown (Right)

To further identify ANTAR-target RNAs in actinobacteria, we used the partially focused search model developed in the mycobacterial study (Fig. 1B) and performed a covariance-based RNA search against all sequenced (~ 720) actinobacterial genomes. This search could identify ~ 243 ANTAR-target RNAs with high confidence. However, these newly found RNAs were restricted to less than 30% (197 of 720) sequenced actinobacterial genomes (Fig. 1B-C).

In order to improve the search and predict RNAs more comprehensively across actinobacteria, we picked 30 representative RNAs from the initial 243 hits and created a new and fully actinobacterial search model (Additional file 1: Table S1). This model was then used as input in a covariance search against the 720 genomes. The RNA hits obtained from this search were filtered through a bit score ≥ 15. The bit score reports on the similarity of each RNA hit to the consensus derived from the search model as compared to a null model of non-homologous sequences. We have also manually examined each individual RNA hit to ensure that the reported RNAs do possess the known ANTAR target RNA features. This includes the dual stem-loop structured motif with hexanucleotide loops and a linker region that lies between 2 to 25 nucleotides. After manual curation, ~ 1228 RNAs were predicted with high confidence (Additional file 1: Table S2, S3), and importantly- RNAs were found in nearly 74.5% of sequenced actinobacteria (Fig. 1B-C). Additionally, we took the 30 actinobacterial RNA sequences from the diffused search model and shuffled each sequence to result in 15,000 new sequences (500 sequences from each RNA). Shuffling was performed using fasta-shuffle-letters from the MEME-suite with all positions shuffled or maintaining the dinucleotide frequencies (https://meme-suite.org/meme/doc/fasta-shuffle-letters.html). These shuffled sequences serve as negative control data sets (Additional file 1: Fig. S1A). Notably, our search identified only two RNA hits with a bit score ≥ 15 from these sets suggesting that our false positive rates are extremely low. Comparing the performance statistics of the searches from the 3 search models (focused/ partially focused/ diffused) reveals a significant advantage gained by the diffused model over the other two models (Fig. 1C, Additional file 1: Fig. S1B). Hence the RNAs identified through the diffused search model were considered for further analyses.

Removal of identical RNAs from different strains of a species resulted in ~ 611 unique ANTAR-target RNAs. Moreover, the 243 RNAs predicted initially, were also recovered in this search. This includes ANTAR-target RNAs predicted in mycobacterial species, which have been experimentally validated as binders of ANTAR proteins [28].

We additionally analyzed this search model using the cmbuild program [29], which creates a statistical profile of alignments and thus reports on the extent of sequence conservation and base-pairing potential (co-variation) within the aligned RNAs. Based on sequence (42% sequence identity) and structure (Covariance Model, CM score = 0.48), the actinobacterial seed alignment shows significantly higher variation than the partially focused firmicute/proteobacteria seed (51% sequence identity and a CM score of 0.61). These results indicate that an actinobacteria-enriched search model that allows higher sequence/structural diversity while maintaining the core defining features of the RNA family is ideal for identifying new RNAs in actinobacteria.

In order to understand the characteristics of ANTAR-target RNAs in actinobacteria, we compared the 611 predicted actinobacterial RNAs with the previously reported 306 ANTAR-target RNAs from firmicutes and proteobacteria [27]. Using cmbuild the RNAs from each set (actinobacterial versus firmicute-proteobacteria) were clustered at increasing sequence identity thresholds (Fig. 1D). We find a stark difference between the two sets of RNAs. The actinobacterial RNAs start to separate out as clusters at a much lower sequence identity threshold (50%) when compared to firmicutes and proteobacteria (55%). This shows inherent diversity within the actinobacterial RNAs, possessing less than 50% sequence identity. We further analyzed the largest cluster of RNAs from each set for the extent of structural conservation. Even here, RNAs that are similar in sequence and hence clustered together showed a low CM (Covariance Model) score of ~ 0.44 when compared to the firmicutes and proteobacterial set (CM score: ~ 0.60). This confirms that actinobacterial RNAs allow for significantly higher sequence and structure variations (Additional file 1: Fig. S1C).

Next we subjected all the RNA hits to analysis using RNAz [17, 30] which computes a consensus secondary structure. We find that these RNAs, as expected fold into a dual stem-loop motif maintaining the core ANTAR-target RNA structural features. The consensus secondary structure for these RNAs shows more than 50% conservation of adenine and guanine in loop positions 1 and 4 respectively, and ~ 50% conservation within the stems (Additional file 1: Fig. S2A). The largest cluster of RNAs (~ 75% of all the predicted RNAs) shows a minimum free energy of − 7.90 and a structure conservation index (SCI) of 0.48 (Fig. 1E). The mean z-score of − 1.09 obtained for these RNAs indicates that the structure motif observed is a stable true motif and does not occur by chance. The test for functionality based on SCI and z-score indicates that these RNAs belong to ‘functional RNA’ class (P > 0.5). A similar RNAz analysis for all actinobacterial RNA hits is summarized in Additional file 1: Fig. S2B.

Distribution of ANTAR proteins and target-RNAs in actinobacteria

With a comprehensive list of ~ 611 ANTAR-target RNAs identified, we looked at their distribution in the 128 known genera of actinobacteria and found that RNAs were predicted in genomes representing 87 genera which include 219 species (Fig. 2A, inset). The majority of actinobacterial species possess 1 to 3 RNAs per genome (Fig. 2A), while some species of Actinomyces, Microbacterium, Bifidobacterium, Trupurella and Arthrobacter appear to possess nearly 10 or even up to 36 different RNAs in the same genome (Fig. 2A-B).

Fig. 2
figure2

Distribution of ANTAR-target RNAs identified in Actinobacteria. a Distribution of ANTAR-target RNAs (yellow) or ANTAR proteins (blue) in actinobacterial species. Inset pie-chart shows the number of actinobacterial genera where ANTAR-target RNAs are predicted. b Distribution of ANTAR-target RNAs (left) and ANTAR proteins (right) in actinobacterial genera are shown as box-whisker plots. Median (vertical line), interquartile range (box) and 1.5 times the inter-quartile range (whiskers) are shown

The ANTAR domain is an RNA-binding domain and proteins containing this domain are known to selectively recognize and bind RNAs of this family. Hence we asked if the distribution of RNAs reflected the distribution of the ANTAR proteins. To this end, we performed an HMMsearch using the ANTAR domain HMM model from the protein family database (Pfam: PF03861). With an e-value threshold set to 1e-4, we identified ~ 1459 ANTAR-domain containing proteins in 245 species of actinobacteria. As seen for the RNAs, the distribution of ANTAR proteins too shows high variation, ranging from 1 to greater than 10 ANTAR proteins in a genome (Fig. 2B, Additional file 1: Fig. S3). Interestingly, within the same genome, we do not always see a one to one correlation between the number of RNAs predicted and the number of ANTAR proteins present (Fig. 2B). For example, in Xylanimonas there are 3 distinct ANTAR domain proteins with unique domain architectures. However, we predict only one ANTAR target RNA here, suggesting that the same RNA may act as a hub through which many different ANTAR proteins may act, towards different cellular outcomes. In contrast, Trueperella appears to possess a single ANTAR domain protein but 12 predicted RNAs, suggesting that many convergent processes may be controlled by ANTAR in Trueperella.

We found examples (< 30% of species) where no RNAs were predicted despite the presence of one or more ANTAR proteins in the genome. Similarly, in a few examples no ANTAR proteins are present in a genome even though ANTAR-target RNAs are predicted with high confidence. Whether or not ANTAR proteins and RNAs have an active role in these organisms, or if alternate approaches are required to find RNAs and proteins in these organisms remains to be seen (Fig. 2B, Additional file 1: Fig. S3). Regardless, these analyses imply that within phylum actinobacteria there is diversity of ANTAR function and mechanism. While the presence of RNAs is not sufficient to indicate active association with the ANTAR protein, we note examples from previous studies where an RNA-binding activity for actinobacterial ANTAR proteins has been reported [28, 31].

ANTAR-target RNAs are located in untranslated and coding regions of mRNAs

Previous studies have shown that ANTAR proteins, upon activation (through phosphorylation) bind to their target-RNAs and regulate downstream gene expression in cis [22, 24, 27, 28, 31,32,33]. Hence we analyzed the genomic locations and contexts of the predicted RNAs.

Based on genomic location, RNAs were categorized as: 1) intergenic (RNA lies 15 nt–500 nt upstream to an ORF), 2) sequester RBS or AUG (RNA harbors the ribosome-binding site (RBS) or the start codon or 3) inside ORF (RNA resides after the ORF start-site and lies within ≤100 nt of the ORF start-site) (Fig. 3A-B).

Fig. 3
figure3

Locations of ANTAR-target RNAs within their genomic context. a Schematic shows the location of ANTAR-target RNAs. RNAs are grouped in three categories- intergenic’ for RNAs that lie at a distance> 15 nt from start of ORF, ‘sequester RBS or AUG’ for RNAs which overlap with the ribosome binding site or start codon and ‘inside ORF’ for RNAs which lie after the start codon. 10-nucleotide flanking regions on either side of the dual stem loop structure are included in the distance calculations. b Histogram shows distribution of RNAs versus their distance from the respective ORF. Several RNAs are found near the ORF start site, sequestering either RBS or AUG (yellow). c Plot shows total number of predicted RNAs in three categories as described in panel A. 47 RNAs (dashed brown box) in the ‘sequester RBS or AUG’ category and 15 RNAs (dashed red box) in the ‘inside ORF category were assigned based on alternate ORF predictions. d Representative RNAs from ‘sequester RBS or AUG’ category are shown with the ANTAR-target RNA structure marked. Potential RBS (red) and start codon (yellow) are shown. Genomic context of these RNAs (blue) are shown with ORFs (purple) with their NCBI gene annotations. E Representative RNAs from the ‘inside ORF’ category are shown. The dual stems of the ANTAR-target RNA are highlighted in pink and blue. Start codon is marked in yellow

We find that from a total of 611 RNAs analyzed, ~ 39% RNAs are intergenic with a majority lying immediately upstream of an ORF, possibly in the 5’UTR of the corresponding mRNA (Fig. 3C, Additional file 2: Table S4). These RNAs were subjected to rho-independent terminator prediction using TransTermHP v2.08 [34] but only few of the RNAs appear to reside upstream of a terminator, with the second stem loop showing alternate base-pairing with the terminator (Additional file 1: Fig. S4). These few examples are reminiscent of ANTAR-target RNAs in firmicutes and proteobacteria, where binding by the ANTAR protein stabilizes the two-stem loop anti-terminator structure, allowing transcription of the downstream gene. With high GC genomes, it is possible that terminator predictions are inaccurate for these bacteria and hence other approaches may be required to ascertain the mode of transcriptional regulation.

Nearly ~ 37% of the actinobacterial target-RNAs overlap directly with the RBS or start codon (Fig. 3C, Additional file 2: Table S4). In a recent report it was shown that in M. tuberculosis and M. smegmatis, binding of activated ANTAR protein to such target RNAs, represses translation of the downstream mRNA, possibly by occluding the ribosome from binding the RBS [28]. We see similar features in these RNAs. For example, in Arthrobacter alpinus and Bifidobacterium longum, the RBS is sequestered within the second stem-loop, whereas in Propionimicrobium species both the RBS and the ORF start site lie within the ANTAR-target RNA (Fig. 3D).

The ‘inside ORF’ category consists of ~ 24% ANTAR-target RNAs (Fig. 3C, representatives shown in Fig. 3E). Several studies on non-coding RNAs [35,36,37,38,39,40,41,42] have shown that structured motifs within the mRNA transcript may influence mRNA stability or regulate translation. It is possible that these ANTAR-target RNAs also control gene expression, though the detailed mechanism needs to be uncovered.

Cellular pathways and genes associated with actinobacterial ANTAR-target RNAs

We next asked what cellular processes are linked to ANTAR in actinobacteria. Studies in Enterococcus, Pseudomonas, Klebsiella, Acinetobacter and Geobacter reveal that ANTAR-target RNAs are linked to nitrogen utilization [22, 23, 27, 33]. Only few studies in actinobacteria have investigated the role of ANTAR. In Mycobacteria, ANTAR mediated gene regulation might influence lipid and related redox processes [28] while a recent study in Streptomyces, show that the deletion of ANTAR-protein (SSDG_04087) impairs the developmental process and antibiotic production [43].

For this analysis, we considered all 3 categories of RNAs (UTR, sequestering RBS or AUG, inside ORF). Specifically, where the RNA hit resides in the UTR or overlaps with the RBS/AUG, we consider the gene immediately distal to the dual stem loop as the gene linked to ANTAR-target RNAs. For inside ORF category, the gene inside which the RNA lies is considered to be linked to ANTAR. Taking these genes as input, we performed COG analyses using the eggNOGmapper server. eggNOGmapper is a tool that performs a protein sequence homology search against precomputed eggNOG protein database to identify orthologs using a BLAST-like approach, and assigns the COG functional categories, KEGG pathways and gene ontology terms from orthologs to the query [44, 45].

Our analysis showed that ~ 85% of genes linked to ANTAR-target RNAs belong to 17 different COG categories, while 15% are genes of yet unknown function (Fig. 4A-B, Additional file 1: Fig. S5A-B, Additional file 2: Table S4). The majority of genes encode proteins involved in transport and metabolism of compounds, with a smaller subset restricted to enzymes involved in energy production. Core cellular processes including transcription, translation, replication and DNA repair also appear to be linked to ANTAR-target RNAs, and make up the next largest categories of COGs (Fig. 4A, Additional file 1: Fig. S5A). Additionally, we find a diversity of metabolites whose transport and metabolism would be linked to ANTAR (Fig. 4B, Additional file 1: Fig. S5B), with carbohydrate, amino-acid and lipids standing out as preferred metabolites.

Fig. 4
figure4

COG analysis of genes linked to ANTAR-target RNAs in actinobacteria. a Genes linked to ANTAR-target RNAs, analysed using EggNOG-mapper, get assigned to 11 COG categories. Bar plot shows distribution of genes linked to ANTAR-target RNAs, in each COG category. b Bar plot shows distribution of genes linked to ANTAR-target RNAs, within the ‘transport and metabolism’ COG category. Carbohydrate and amino-acid transport and metabolism are the major processes represented by the targets. c ABC transporters with the substrate binding protein, membrane bound permease and ATP-binding components (boxes) are shown. Components of the transporter whose functions are not known are marked (?). Transporter components whose transcripts harbor an ANTAR-target RNA are marked in orange. Genes linked to ANTAR-target RNAs, encoding MFS transporters (purple) and other transporters (blue) are shown

We next asked if a cellular process or function linked to ANTAR was restricted to any particular branch within the actinobacterial phylogenetic tree (Additional file 1: Fig. S6). Some processes such as replication, recombination and repair and transcription, are ubiquitously seen linked to ANTAR, in most genera. In contrast, intracellular trafficking, secretion and vesicular transport process appear restricted to Gordonia species while translation related processes and lipid transport and metabolism are largely restricted to non-pathogenic Mycobacterium and Nocardia species respectively. Energy production and conversion is found to be conserved in species of Pseudarthrobacter, Renibacterium, Sinomonas, Rhodococcus, Gordonia and Mycobacterium.

We checked if closely related genera have co-opted ANTAR for similar processes. Indeed, several species from Bifidobacterium and Gardnerella, have processes such as carbohydrate transport and metabolism, transcription, translation related processes and cell-membrane biogenesis linked to ANTAR. Similarly, six processes including cell energy production and conversion process, transcription and translation related processes, signal transduction mechanisms and amino-acid and carbohydrate transport and metabolism are linked to ANTAR in closely related Arthrobacter and Pseudoarthrobacter species (Additional file 1: Fig. S6).

KEGG pathway and KEGG BRITE analysis of transporters whose genes are linked to ANTAR, show that they belong to the ABC transporter, MFS sugar transporter and Aquaporin families (Additional file 3: Table S5). The ABC transporter complex consists of multiple components: a periplasmic substrate-binding protein, one or more trans-membrane permeases, an ATP-binding protein and occasionally a substrate-specific enzyme [46]. Interestingly, we find that different components of the transporters, especially the substrate recognizing proteins harbor the ANTAR-target RNA in their mRNA (Fig. 4C). This makes intuitive sense since transporters are often under tight regulation and the different components are made only upon sensing the presence of the cognate sugar/metabolite.

The second highest COG category is that of transcription with over 19 different transcription factor families linked to ANTAR-target RNAs (Additional file 1: Fig. S7A). Remarkably, the majority of these transcription factors are known to regulate the expression of transporter proteins, once again tying back ANTAR-target RNAs to the transport of small molecule metabolites. ~ 29% of transcription factor encoding genes linked to an ANTAR-target RNA belong to the LacI type transcription factor that are major regulators of sugar catabolic genes (Additional file 1: Fig. S7A). For example, in Streptomyces lydicus, a LacI transcription factor (TF) carrying an ANTAR-target RNA is present upstream of transporter components involved in ribose uptake (Additional file 1: Fig. S7B). In Corynebacterium glutamicum, the homologous TF (cg1410) is reported to regulate the downstream rbsDACBK operon in response to ribose availability [47]. Similarly, even in Bifidobacterium dentium, Gardnerella vaginalis and Microbacterium sp. and Cryobacterium arcticum, ANTAR-target RNAs are also linked to LacI transcription factors that regulate other sugar transporters and sugar related genes (Additional file 1: Table S2, Additional file 2: Table S4). These results suggest that in several actinobacteria sugar transporters, as well as the proteins regulating sugar transport and metabolism are under the influence of ANTAR regulation.

TetR family TFs are also linked to ANTAR-target RNAs in several actinobacterial species (Additional file 1: Fig. S7A). Transcription factors belonging to this family typically regulate the expression of enzymes from different catabolic pathways or proteins involved in multi-drug resistance (Additional file 1: Table S2 and Additional file 2: Table S4) [48, 49]. An ANTAR-target RNA in Mycobacterium marinum is found upstream to the MMAR_RS11360 gene encoding a TetR family transcription factor (Additional file 1: Fig. S6B). Its M. tuberculosis homolog, Rv1474c is found to cotranscribe with the upstream aconitase gene and regulates aconitase expression in response to iron [50]. A conserved operon in Streptomyces species is predicted with an ANTAR-target RNA upstream to a SufR encoding gene, SACTE_RS06635 (Additional file 1: Fig. S7A-B). SufR is an ArsR family transcription factor and a repressor of the downstream sufBCDS operon, the primary Fe-S assembly cluster system, that responds to the availability of Fe-S cluster required as protein cofactors in many cellular processes [51]. Both these examples underline that an additional layer of post-transcription gene regulation is likely imposed by virtue of ANTAR-target RNAs.

Discussion

In this study, we identify the repertoire of ANTAR-target RNAs in phylum actinobacteria. Key to our findings was the development of a novel computational search model that was effective in identifying these structured dual stem loop RNA motifs. Covariance search programs rely on both the sequence and the base-pairing information within a search model, to find similar RNA motifs in a genome. Previously reported ANTAR RNA search models [27, 28] either failed or were only partially successful in predicting RNAs in actinobacterial genomes due to a lack of diversity in sequence and base-pairing potential. Removing the bias from highly similar or dissimilar sequences, the new search model developed in this study shows more sequence and structure diversity as compared to the previous models and this was key in identifying ANTAR-target RNA motifs in actinobacteria.

Analysis of the genomic locations of ANTAR-target RNAs from actinobacteria reveals many examples where the RNA is next to the ORF start site, either sequestering the RBS or start codon within the dual stem motif of the ANTAR RNA. A similar genomic arrangement of ANTAR RNAs was seen previously in Mycobacteria [28], was shown to function via translational repression. Here, RNAs bound by activated ANTAR protein were shown to repress translation, possibly by preventing ribosomes from accessing the RBS. Our analysis indicates that translational control via ANTAR-target RNAs may be a prominent mode of regulation in actinobacteria.

Analysis of cellular processes likely to be controlled by ANTAR-target RNAs revealed a link between these RNAs and the transport and metabolism of small molecule compounds, especially carbohydrates, amino acids and lipids. Certain species of Bifidiobacterium, Gardnerella and Scardovia show conservation of ANTAR-target RNAs in transcripts encoding carbohydrate transport and metabolism proteins. For these genera, sugar utilization is intricately linked to physiology. For example, Bifidobacteria are sacchrolytic intestinal bacteria detected in human and animals [52], while Scardovia is detected in human dental caries and adeptly use carbohydrate fermentation pathways to lower the pH of the oral biofilm and likely induce caries progression in the host [53, 54]. Pathogenic Gardnerella vaginalis have the ability to degrade glycans in the host mucosal epithelial layers to invade and colonize in the host [55]. Species belonging to genus Nocardia shows that ANTAR-target RNAs might regulate lipid transport and metabolism similar to to that seen in Mycobacteria. Our results link ANTAR-target RNAs to metabolite transport and utilization in these organisms, possibly indicating that ANTAR regulation may contribute to their growth and survival within their host.

An important finding from our study is the association of ANTAR-target RNAs with mRNAs encoding transcription factors. Transcription factors themselves are regulators of gene-expression, often regulating multiple target genes. By controlling the expression of a transcription factor, even a single ANTAR-target RNA in the genome could indirectly control the expression of multiple genes. We also observed that many of the transcription factors whose mRNAs harbor ANTAR-target RNAs, in fact regulate sugar and other metabolite transport. This implies that the scope of ANTAR-based control of metabolite transport is much broader.

In a recent study in Streptomyces pristinaespiralis, deletion of the ANTAR protein SSDG_04087 led to a bald phenotype (loss of hyphae formation) and reduced production of the antibiotic pristinamycin [43]. In our study, we identify four ANTAR-target RNAs in S. pristinaespiralis, one of which lies in the transcript of a sugar (fructose) transporter protein (SPRI_RS32325). The uptake of complex sugars by Streptomyces favors development (sporulation) and production of antibiotics [56,57,58]. In fact, perturbation of glycolysis/ gluconeogenesis pathways is a standard method by which to increase the production of antibiotics by Streptomyces, for industrial applications [59,60,61,62]. Another ANTAR-target RNA is found in the mRNA for the enzyme agmatinase (SPRI_RS23705), that converts arginine to putrescine. Putrescine is a precursor of succinate [63, 64] that can feed into the TCA cycle and the synthesis of various amino acids, which are directly involved in the production of the antibiotic pristinamycin [65, 66]. The discovery of these ANTAR-target RNAs in Streptomyces thus implicates gene SPRI_RS32325 and SPRI_RS23705 as possible candidates that might be investigated to understand the observed phenotype. Our comprehensive description of ANTAR-target RNAs and ANTAR proteins in actinobacteria now provides a resource for microbiologists to mine.

Conclusion

Our work shows that sequence and structural diversity when introduced in search models, aids in predicting high confidence dual stemloop motifs across phylum actinobacteria. This expands the RNA family that can bind to ANTAR proteins. Actinobacterial ANTAR-target RNAs are distant from the firmicutes and proteobacterial RNAs, yet the core features of ANTAR-target RNAs are conserved across bacteria, highlighting the diversity that can exist within the RNA family. Extensive analyses of the repertoire of ANTAR-target RNAs show that these RNAs can regulate translation of genes involved in metabolite transport, thus underlining the importance of ANTAR in actinobacteria.

Methods

Actinobacterial genomes used in this study

720 actinobacterial genomes, with their corresponding gene annotations and proteomes are listed as “Complete genomes” in NCBI (RefSeq v92). These were considered in this study. Corresponding taxon IDs for these organisms were taken from NCBI and a taxonomy tree was retrieved in Phylip format from NCBI Batch Entrez (https://www.ncbi.nlm.nih.gov/sites/batchentrez). The phylogenetic tree visualization was carried out using iTOL [67].

Predicting ANTAR-target RNAs in actinobacteria using covariance

A search model previously reported for identifying ANTAR-target RNAs in Mycobacteria (partially focused search model) [28], was taken and an initial covariance search with a bit score threshold of 10.0 was carried out against actinobacterial genomes using Infernal v1.0.2 [29]. High confidence RNAs with a bit score ≥ 14.0 and showing a dual stem loop structure (with at least 3 base-pairs in each stem and hexanucleotide loops allowing a single point variation) were considered as putative ANTAR-target RNAs. 30 of these predicted RNAs from actinobacteria were taken to form an actinobacteria centric search model (diffused search model). cmbuild analysis of the partially diffused and diffused search models reports on the CM (Covariance model) score where a higher CM score was taken as an indication of highly similar sequences. To identify ANTAR-target RNAs in actinobacteria, 720 genomes representing 315 actinobacterial species were subjected to covariance search using the diffused search model. Hits with a bit score threshold≥15 and lying between 500 nt upstream to 100 nt downstream of the nearest ORF were retained. RNAs that are identical to the search model or are single-nucleotide point variants were considered for further analyses. Redundant identical RNAs from strains were removed and unique RNAs from each species were considered as a representative. We used cmbuild [29] and RNAz [17, 30] to analyze the predicted ANTAR-target RNAs for their sequence and structure similarity. Using the cmaxid option of cmbuild implemented in Infernal v1.0.2, we performed a clustering analysis. Sequence identity cut-off ranging from 30 to 60% was imposed during clustering such that any two RNAs that have sequence identity more than the cut-off, will form a cluster reported with a corresponding CM score. Any group with < 2 RNAs was not considered. The -cdump option of cmbuild writes the multiple sequence alignment for the clusters. The multiple sequence alignment of the largest cluster formed with 55% sequence identity cut-off, was further checked for functionality using RNAz. RNAz calculates i) the structure similarity of the individual RNAs to the consensus structure, reported as structure conservation index (SCI) ii) z-score that describes the standard deviation of the structures formed by the RNAs in a cluster against the structures for a random set of RNAs with same length and base composition, where the negative z-score indicates a true stable structure and has not occurred by chance. Based on these two measures, the RNAs with conserved and stable structures (P > 0.5) are considered as a ‘functional RNA’ class. Consensus RNA structure for the largest cluster was visualized using forna [68] and nucleotide-level resolution of the consensus structure was obtained using R2R [69] and statistically significant covarying positions were identified using R-scape [70].

Distribution of ANTAR domain containing proteins in actinobacteria

An HMM model for the ANTAR domain was taken from Pfam v33.0 (PF03861) and HMMsearch (hmmer v3.2.1) was performed against all actinobacterial proteomes with e-value threshold 1e-4. E-value threshold was determined from previous studies [71,72,73,74,75]. This identified proteins having ANTAR domains. Proteomes where the HMMsearch failed to identify ANTAR proteins, were further searched for sequences homologous to the Rv1626 ANTAR domain using BLASTp with evalue threshold 1e-3. E-value threshold was determined from previous studies [76,77,78,79].

Categorizing ANTAR-target RNAs based on location within the genomic context

ANTAR-target RNAs were grouped into 3 categories based on their distance from ORFs. RNAs (including 10 nt flanking region) that are 15 nt upstream from start of ORF, were assigned to ‘intergenic’ group. RNAs that completely reside within the ORF were assigned to ‘inside ORF’ group. RNAs that harbor a potential RBS as part of the RNA structure, are grouped as ‘sequester RBS or AUG’. RNAs were also subjected to alternate ORF (altORFs) prediction using standalone NCBI ORFfinder (https://www.ncbi.nlm.nih.gov/orffinder/) with default parameters allowing for ATG or any alternate start codons. Predicted ORFs which harbor a potential ribosome-binding site (RBS) with a 4-6 nt AG-rich region and reside 0-15 nt upstream of the start codon are considered as putative altORFs. RNAs from the ‘intergenic’ group were further subjected to Rho-independent terminator prediction. Here, target-RNA sequences along with 40 nt downstream sequences were given to TransTermHP [34] with parameters uwin-require = 0 and min-conf = 50.

COG and KEGG pathway analyses for ANTAR targets

Protein sequences of genes linked to ANTAR-target RNAs were subjected to COG analysis using EggNOG mapper v4.5.1 (http://eggnogdb.embl.de/#/app/emapper). A minimum 70% query coverage and e-value default threshold 1e-3 was used to assign COG categories and KEGG orthologs (KO) based on sequence homology. E-value threshold was determined from previous studies [80,81,82]. Independently, these protein sequences were given as input to KofamKOALA (https://www.genome.jp/tools/kofamkoala/) with e-value default threshold 1e-2, which reports on top KEGG orthologs using an HMMsearch. E-value threshold was determined from previous studies [83, 84]. Orthologs for genes linked to ANTAR-target RNAs, were mapped using EggNOG and/or KofamKOALA (Additional file 2: Table S4). These KOs were then given to KEGGmapper (“KEGG reconstruct pathway” and “KEGG search and color pathway”) for pathway analyses. Visualization of data was carried out in iTOL and the pathway graphs were obtained using KEGG and modified using Adobe Illustrator. All plots were obtained using Graphpad Prism v8.0.

Availability of data and materials

All RNA sequences reported in this study are given in supplementary files (Table S2 and Table S3). Accession IDs from publicly available NCBI database: https://www.ncbi.nlm.nih.gov/, genomic locations and RNA sequences for these RNAs are also given in Table S2 and Table S3. ANTAR proteins and ANTAR-linked genes are referred to with protein accession numbers which are taken from NCBI database. The files are also available from the corresponding author on reasonable request.

Abbreviations

UTR:

Untranslated regions

MFE:

Minimum free energy

Rfam:

RNA family database

Pfam:

Protein family database

CM score:

Covariance model score

SCI:

Structure conservation index

RBS:

Ribosome binding site

altORFs:

Alternate open reading frames

COG:

Cluster of Orthologous groups

KO:

KEGG Ortholog

TF(s):

Transcription factor(s)

References

  1. 1.

    Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil PA, et al. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol. 2018;36(10):996–1004. https://doi.org/10.1038/nbt.4229.

    CAS  Article  PubMed  Google Scholar 

  2. 2.

    Ventura M, Canchaya C, Tauch A, Chandra G, Fitzgerald GF, Chater KF, et al. Genomics of Actinobacteria: tracing the evolutionary history of an ancient phylum. Microbiol Mol Biol Rev. 2007;71(3):495–548. https://doi.org/10.1128/mmbr.00005-07.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Barrick JE, Breaker RR. The distributions, mechanisms, and structures of metabolite-binding riboswitches. Genome Biol. 2007;8(11):239. https://doi.org/10.1186/gb-2007-8-11-r239.

    CAS  Article  Google Scholar 

  4. 4.

    Kazanov MD, Vitreschak AG, Gelfand MS. Abundance and functional diversity of riboswitches in microbial communities. BMC Genomics. 2007;8(1):347. https://doi.org/10.1186/1471-2164-8-347.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Hör J, Gorski SA, Vogel J. Bacterial RNA biology on a genome scale. Mol Cell Cell Press. 2018;70(5):785–99. https://doi.org/10.1016/j.molcel.2017.12.023.

    CAS  Article  Google Scholar 

  6. 6.

    Updegrove TB, Zhang A, Storz G. Hfq: The flexible RNA matchmaker. Curr Opin Microbiol Elsevier Ltd. 2016:133–8. https://doi.org/10.1016/j.mib.2016.02.003.

  7. 7.

    Durand S, Tomasini A, Braun F, Condon C, Romby P. sRNA and mRNA turnover in gram-positive bacteria. FEMS Microbiol Rev. Oxford University Press. 2015;39(3):316–30. https://doi.org/10.1093/femsre/fuv007.

    CAS  Article  Google Scholar 

  8. 8.

    Repoila F, Darfeuille F. Small regulatory non-coding RNAs in bacteria: physiology and mechanistic aspects. Biol Cell. 2009;101(2):117–31. https://doi.org/10.1042/BC20070137.

    CAS  Article  PubMed  Google Scholar 

  9. 9.

    Hoeppner MP, Gardner PP, Poole AM. Comparative Analysis of RNA Families Reveals Distinct Repertoires for Each Domain of Life. Wilke CO, editor. PLoS Comput Biol. 2012;8:e1002752. https://doi.org/10.1371/journal.pcbi.1002752.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Seliverstov AV, Putzer H, Gelfand MS, Lyubetsky VA. Comparative analysis of RNA regulatory elements of amino acid metabolism genes in Actinobacteria. BMC Microbiol. 2005;5(1):54. https://doi.org/10.1186/1471-2180-5-54.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Mentz A, Neshat A, Pfeifer-Sancar K, Pühler A, Rückert C, Kalinowski J. Comprehensive discovery and characterization of small RNAs in Corynebacterium glutamicum ATCC 13032. BMC Genomics. 2013;14(1):714. https://doi.org/10.1186/1471-2164-14-714.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  12. 12.

    Peng T, Kan J, Hu J, Hu Z. Genes and novel sRNAs involved in PAHs degradation in marine bacteria Rhodococcus sp. P14 revealed by the genome and transcriptome analysis. 3 Biotech. 2020;10: 140. doi:https://doi.org/10.1007/s13205-020-2133-6

  13. 13.

    Pánek J, Krásný L, Bobek J, Ježková E, Korelusová J, Vohradský J. The suboptimal structures find the optimal RNAs: homology search for bacterial non-coding RNAs using suboptimal RNA structures. Nucleic Acids Res. 2011;39(8):3418–26. https://doi.org/10.1093/nar/gkq1186.

    CAS  Article  PubMed  Google Scholar 

  14. 14.

    Engel F, Ossipova E, Jakobsson P-J, Vockenhuber M-P, Suess B. sRNA scr5239 involved in feedback loop regulation of Streptomyces coelicolor central metabolism. Front Microbiol. 2020;10:3121. https://doi.org/10.3389/fmicb.2019.03121.

    Article  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Taneja S, Dutta T. On a stake-out: Mycobacterial small RNA identification and regulation. Noncoding RNA Res KeAi Communications Co. 2019:86–95. https://doi.org/10.1016/j.ncrna.2019.05.001.

  16. 16.

    Heueis N, Vockenhuber M-P, Suess B. Small non-coding RNAs in streptomycetes. RNA Biol. 2014;11:464–9. https://doi.org/10.4161/rna.28262.

    Article  PubMed  PubMed Central  Google Scholar 

  17. 17.

    Washietl S, Hofacker IL, Stadler PF. Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci U S A. 2005;102(7):2454–9. https://doi.org/10.1073/pnas.0409169102.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Vitreschak AG, Mironov A, Gelfand M. The RNApattern program: searching for RNA secondary structure by the pattern rule; 2001.

    Google Scholar 

  19. 19.

    Lorenz R, Bernhart SH, Hoener C, Siederdissen Z, Tafer H, Flamm C, et al. ViennaRNA package 2.0 algorithms for molecular biology ViennaRNA package 2.0. Algorithms Mol Biol. 2011;6(1):26. https://doi.org/10.1186/1748-7188-6-26.

    Article  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Vockenhuber MP, Sharma CM, Statt MG, Schmidt D, Xu Z, Dietrich S, et al. Deep sequencing-based identification of small non-coding RNAs in Streptomyces coelicolor. RNA Biol. 2011;8(3):468–77. https://doi.org/10.4161/rna.8.3.14421.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Vitreschak AG, Rodionov DA, Mironov AA, Gelfand MS. Regulation of the vitamin B12 metabolism and transport in bacteria by a conserved RNA structural element. RNA. 2003;9(9):1084–97. https://doi.org/10.1261/rna.5710303.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  22. 22.

    Chai W, Stewart V. NasR, a novel RNA-binding protein, mediates nitrate-responsive transcription antitermination of the Klebsiella oxytoca M5al nasF operon leader in vitro. J Mol Biol. 1998;283(2):339–51. https://doi.org/10.1006/jmbi.1998.2105.

    CAS  Article  PubMed  Google Scholar 

  23. 23.

    Drew R, Lowe N. Positive control of Pseudomonas aeruginosa amidase synthesis is mediated by a transcription anti-termination mechanism. J Gen Microbiol. 1989;135(4):817–23. https://doi.org/10.1099/00221287-135-4-817.

    CAS  Article  PubMed  Google Scholar 

  24. 24.

    Wilson SA, Wachira SJ, Norman RA, Pearl LH, Drew RE. Transcription antitermination regulation of the Pseudomonas aeruginosa amidase operon. EMBO J. 1996;15(21):5907–16. https://doi.org/10.1002/j.1460-2075.1996.tb00977.x.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Goldman BS, Lin JT, Stewart V. Identification and structure of the nasR gene encoding a nitrate- and nitrite-responsive positive regulator of nasFEDCBA (nitrate assimilation) operon expression in Klebsiella pneumoniae M5al. J Bacteriol. 1994;176(16):5077–85. https://doi.org/10.1128/JB.176.16.5077-5085.1994.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Ueki T, Lovley DR. Novel regulatory cascades controlling expression of nitrogen-fixation genes in Geobacter sulfurreducens. Nucleic Acids Res. 2010;38(21):7485–99. https://doi.org/10.1093/nar/gkq652.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Ramesh A, DebRoy S, Goodson JR, Fox KA, Faz H, Garsin DA, et al. The Mechanism for RNA Recognition by ANTAR Regulators of Gene Expression. Burkholder WF, editor. PLoS Genet. 2012;8:e1002666. https://doi.org/10.1371/journal.pgen.1002666.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Mehta D, Koottathazhath A, Ramesh A. Discovery of ANTAR-RNAs and their mechanism of action in mycobacteria. J Mol Biol. 2020;432(14):4032–48. https://doi.org/10.1016/j.jmb.2020.05.003.

    CAS  Article  PubMed  Google Scholar 

  29. 29.

    Nawrocki EP, Kolbe DL, Eddy SR. Infernal 1.0: inference of RNA alignments. Bioinformatics. 2009;25(10):1335–7. https://doi.org/10.1093/bioinformatics/btp157.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  30. 30.

    Altman RB, Dunker AK, Hunter L, Murray TA, Klein TE, GRUBER AR, et al. RNAZ 2.0: Biocomputing 2010. WORLD SCIENTIFIC. 2009:69–79. https://doi.org/10.1142/9789814295291_0009.

  31. 31.

    Weber AM, Kaiser J, Ziegler T, Pilsl S, Renzl C, Sixt L, et al. A blue light receptor that mediates RNA binding and translational regulation. Nat Chem Biol. 2019;15(11):1085–92. https://doi.org/10.1038/s41589-019-0346-y.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Fox KA, Ramesh A, Stearns JE, Bourgogne A, Reyes-Jara A, Winkler WC, et al. Multiple posttranscriptional regulatory mechanisms partner to control ethanolamine utilization in Enterococcus faecalis. Proc Natl Acad Sci U S A. 2009;106(11):4435–40. https://doi.org/10.1073/pnas.0812194106.

    Article  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Malaka De Silva P, Patidar R, Graham CI, AKC B, Kumar A. A response regulator protein with antar domain, avnr, in acinetobacter baumannii ATCC 17978 impacts its virulence and amino acid metabolism. Microbiol (United Kingdom). 2020;166:554–66. https://doi.org/10.1099/mic.0.000913.

    CAS  Article  Google Scholar 

  34. 34.

    Kingsford CL, Ayanbule K, Salzberg SL. Rapid, accurate, computational discovery of rho-independent transcription terminators illuminates their relationship to DNA uptake. Genome Biol. 2007;8(2):R22. https://doi.org/10.1186/gb-2007-8-2-r22.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  35. 35.

    Tapsin S, Sun M, Shen Y, Zhang H, Lim XN, Susanto TT, et al. Genome-wide identification of natural RNA aptamers in prokaryotes and eukaryotes. Nat Commun. 2018;9(1):1–10. https://doi.org/10.1038/s41467-018-03675-1.

    CAS  Article  Google Scholar 

  36. 36.

    Del Campo C, Bartholomäus A, Fedyunin I, Ignatova Z. Secondary Structure across the Bacterial Transcriptome Reveals Versatile Roles in mRNA Regulation and Function. Toledo-Arana A, editor. PLoS Genet. 2015;11:e1005613. https://doi.org/10.1371/journal.pgen.1005613.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  37. 37.

    Tsuchihashi Z, Kornberg A. Translational frameshifting generates the gamma subunit of DNA polymerase III holoenzyme. Proc Natl Acad Sci USA. 1990;87(7):2516–20. https://doi.org/10.1073/pnas.87.7.2516.

  38. 38.

    Chen C, Zhang H, Broitman SL, Reiche M, Farrell I, Cooperman BS, et al. Dynamics of translation by single ribosomes through mRNA secondary structures. Nat Struct Mol Biol. 2013;20(5):582–8. https://doi.org/10.1038/nsmb.2544.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  39. 39.

    Gorochowski TE, Ignatova Z, Bovenberg RAL, Roubos JA. Trade-offs between tRNA abundance and mRNA secondary structure support smoothing of translation elongation rate. Nucleic Acids Res. 2015;43(6):3022–32. https://doi.org/10.1093/nar/gkv199.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Murat P, Zhong J, Lekieffre L, Cowieson NP, Clancy JL, Preiss T, et al. G-quadruplexes regulate Epstein-Barr virus-encoded nuclear antigen 1 mRNA translation. Nat Chem Biol. 2014;10(5):358–64. https://doi.org/10.1038/nchembio.1479.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Caliskan N, Peske F, Rodnina MV. Changed in translation: MRNA recoding by −1 programmed ribosomal frameshifting. Trends in Biochemical Sciences. Elsevier Ltd. 2015;40(5):265–74. https://doi.org/10.1016/j.tibs.2015.03.006.

    CAS  Article  Google Scholar 

  42. 42.

    Giedroc DP, Cornish PV. Frameshifting RNA pseudoknots: structure and mechanism. Virus Res. 2009;139(2):193–208. https://doi.org/10.1016/j.virusres.2008.06.008.

    CAS  Article  PubMed  Google Scholar 

  43. 43.

    Li L, Zhao Y, Ma J, Tao H, Zheng G, Chen J, et al. The orphan histidine kinase PdtaS-p regulates both morphological differentiation and antibiotic biosynthesis together with the orphan response regulator PdtaR-p in Streptomyces. Microbiol Res. 2020;233:126411. https://doi.org/10.1016/j.micres.2020.126411.

    CAS  Article  PubMed  Google Scholar 

  44. 44.

    Huerta-Cepas J, Szklarczyk D, Heller D, Hernández-Plaza A, Forslund SK, Cook H, et al. EggNOG 5.0: a hierarchical, functionally and phylogenetically annotated orthology resource based on 5090 organisms and 2502 viruses. Nucleic Acids Res. 2019;47(D1):D309–14. https://doi.org/10.1093/nar/gky1085.

    CAS  Article  PubMed  Google Scholar 

  45. 45.

    Huerta-Cepas J, Forslund K, Coelho LP, Szklarczyk D, Jensen LJ, Von Mering C, et al. Fast genome-wide functional annotation through orthology assignment by eggNOG-mapper. Mol Biol Evol. 2017;34(8):2115–22. https://doi.org/10.1093/molbev/msx148.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Higgins CF. ABC Transporters: From microorganisms to man. Ann Rev Cell Biol. 1992:67–113. https://doi.org/10.1146/annurev.cb.08.110192.000435.

  47. 47.

    Nentwich SS, Brinkrolf K, Gaigalat L, Hüser AT, Rey DA, Mohrbach T, et al. Characterization of the LacI-type transcriptional repressor RbsR controlling ribose transport in Corynebacterium glutamicum ATCC 13032. Microbiology. 2009;155(1):150–64. https://doi.org/10.1099/mic.0.020388-0.

    CAS  Article  PubMed  Google Scholar 

  48. 48.

    Ramos JL, Martínez-Bueno M, Molina-Henares AJ, Terán W, Watanabe K, Zhang X, et al. The TetR family of transcriptional repressors. Microbiol Mol Biol Rev. 2005;69(2):326–56. https://doi.org/10.1128/mmbr.69.2.326-356.2005.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Bhukya H, Anand R. TetR regulators: a structural and functional perspective. J Indian Inst Sci. 2017;97(2):245–59. https://doi.org/10.1007/s41745-017-0025-5.

    Article  Google Scholar 

  50. 50.

    Balakrishnan K, Mohareer K, Banerjee S. Mycobacterium tuberculosis Rv1474c is a TetR-like transcriptional repressor that regulates aconitase, an essential enzyme and RNA-binding protein, in an iron-responsive manner. Tuberculosis. 2017;103:71–82. https://doi.org/10.1016/j.tube.2017.01.003.

    CAS  Article  PubMed  Google Scholar 

  51. 51.

    Cheng Y, Lyu M, Yang R, Wen Y, Song Y, Li J, et al. SufR, a [4Fe-4S] cluster-containing transcription factor, represses the sufRBDCSU operon in Streptomyces avermitilis iron-sulfur cluster assembly. Appl Environ Microbiol. 2020;86(18). https://doi.org/10.1128/AEM.01523-20.

  52. 52.

    Pokusaeva K, Fitzgerald GF, Van Sinderen D. Carbohydrate metabolism in Bifidobacteria. Genes Nutr. BioMed Central. 2011;6(3):285–306. https://doi.org/10.1007/s12263-010-0206-6.

    CAS  Article  Google Scholar 

  53. 53.

    Kressirer CA, Smith DJ, King WF, Dobeck JM, Starr JR, ACR T. Scardovia wiggsiae and its potential role as a caries pathogen. Journal of Oral Biosciences. Japanese Assoc Oral Biol. 2017:135–41. https://doi.org/10.1016/j.job.2017.05.002.

  54. 54.

    Kameda M, Abiko Y, Washio J, Tanner ACR, Kressirer CA, Mizoguchi I, et al. Sugar metabolism of Scardovia wiggsiae, a novel caries-associated bacterium. Front Microbiol. 2020;11. https://doi.org/10.3389/fmicb.2020.00479.

  55. 55.

    Lewis WG, Robinson LS, Gilbert NM, Perry JC, Lewis AL. Degradation, foraging, and depletion of mucus sialoglycans by the vagina-adapted actinobacterium Gardnerella vaginalis. J Biol Chem. 2013;288(17):12067–79. https://doi.org/10.1074/jbc.M113.453654.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  56. 56.

    Rueda B, Miguélez EM, Hardisson C, Manzanal MB. Changes in glycogen and trehalose content of Streptomyces brasiliensis hyphae during growth in liquid cultures under sporulating and non-sporulating conditions. FEMS Microbiol Lett. 2001;194(2):181–5. https://doi.org/10.1111/j.1574-6968.2001.tb09466.x.

    CAS  Article  PubMed  Google Scholar 

  57. 57.

    Światek MA, Urem M, Tenconi E, Rigali S, van Wezel GP. Engineering of N-acetylglucosamine metabolism for improved antibiotic production in Streptomyces coelicolor A3(2) and an unsuspected role of NagA in glucosamine metabolism. Bioengineered. 2012;3(5):280–5. https://doi.org/10.4161/bioe.21371.

    Article  PubMed  PubMed Central  Google Scholar 

  58. 58.

    Rafieenia R. Effect of nutrients and culture conditions on antibiotic synthesis in Streptomycetes. Asian J Pharm Health Sci. 2013;3(3):810–15.

  59. 59.

    Butler MJ, Bruheim P, Jovetic S, Marinelli F, Postma PW, Bibb MJ. Engineering of primary carbon metabolism for improved antibiotic production in Streptomyces lividans. Appl Environ Microbiol. 2002;68(10):4731–9. https://doi.org/10.1128/AEM.68.10.4731-4739.2002.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  60. 60.

    Li R, Townsend CA. Rational strain improvement for enhanced clavulanic acid production by genetic engineering of the glycolytic pathway in Streptomyces clavuligerus. Metab Eng. 2006;8(3):240–52. https://doi.org/10.1016/j.ymben.2006.01.003.

    CAS  Article  PubMed  Google Scholar 

  61. 61.

    Ryu YG, Butler MJ, Chater KF, Lee KJ. Engineering of primary carbohydrate metabolism for increased production of actinorhodin in Streptomyces codicolor. Appl Environ Microbiol. 2006;72(11):7132–9. https://doi.org/10.1128/AEM.01308-06.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  62. 62.

    Huang D, Wen J, Wang G, Yu G, Jia X, Chen Y. In silico aided metabolic engineering of Streptomyces roseosporus for daptomycin yield improvement. Appl Microbiol Biotechnol. 2012;94(3):637–49. https://doi.org/10.1007/s00253-011-3773-6.

    CAS  Article  PubMed  Google Scholar 

  63. 63.

    Krysenko S, Okoniewski N, Kulik A, Matthews A, Grimpo J, Wohlleben W, et al. Gamma-glutamylpolyamine synthetase GlnA3 is involved in the first step of polyamine degradation pathway in Streptomyces coelicolor M145. Front Microbiol. 2017;8. https://doi.org/10.3389/fmicb.2017.00726.

  64. 64.

    Schneider BL, Reitzer L. Pathway and enzyme redundancy in putrescine catabolism in Escherichia coli. J Bacteriol. 2012;194(15):4080–8. https://doi.org/10.1128/JB.05063-11.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  65. 65.

    Voelker F, Altaba S. Nitrogen source governs the patterns of growth and pristinamycin production in “Streptomyces pristinaespiralis.” Microbiol. 2001;147:2447–59. https://doi.org/10.1099/00221287-147-9-2447.

  66. 66.

    Zhang LJ, Jin ZH, Chen XG, Jin QC, Feng MG. Glycine feeding improves pristinamycin production during fermentation including resin for in situ separation. Bioprocess Biosyst Eng. 2012;35(4):513–7. https://doi.org/10.1007/s00449-011-0624-x.

    CAS  Article  PubMed  Google Scholar 

  67. 67.

    Letunic I, Bork P. Interactive tree of life (iTOL) v3: an online tool for the display and annotation of phylogenetic and other trees. Nucleic Acids Res. 2016;44(W1):W242–5. https://doi.org/10.1093/nar/gkw290.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  68. 68.

    Kerpedjiev P, Hammer S, Hofacker IL. Forna (force-directed RNA): simple and effective online RNA secondary structure diagrams. Bioinformatics. 2015;31(20):3377–9. https://doi.org/10.1093/bioinformatics/btv372.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  69. 69.

    Weinberg Z, Breaker RR. R2R--software to speed the depiction of aesthetic consensus RNA secondary structures. BMC Bioinformatics. 2011;12(1):3. https://doi.org/10.1186/1471-2105-12-3.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  70. 70.

    Rivas E, Clements J, Eddy SR. A statistical test for conserved RNA structure shows lack of evidence for structure in lncRNAs. Nat Methods. 2016;14(1):45–8. https://doi.org/10.1038/nmeth.4066.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  71. 71.

    Mudgal R, Sowdhamini R, Chandra N, Srinivasan N, Sandhya S. Filling-in void and sparse regions in protein sequence space by protein-like artificial sequences enables remarkable enhancement in remote homology detection capability. J Mol Biol. 2014;426(4):962–79. https://doi.org/10.1016/j.jmb.2013.11.026.

    CAS  Article  PubMed  Google Scholar 

  72. 72.

    Bradshaw CR, Surendranath V, Henschel R, Mueller MS, Habermann BH. HMMerthread: detecting remote, functional conserved domains in entire genomes by combining relaxed sequence-database searches with fold recognition. PLoS One. 2011;6(3):e17568. https://doi.org/10.1371/journal.pone.0017568.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  73. 73.

    Covert BA, Spencer JS, Orme IM, Belisle JT. The application of proteomics in defining the T cell antigens of Mycobacterium tuberculosis. Proteomics. 2001;1(4):574–86. https://doi.org/10.1002/1615-9861(200104)1:4<574::AID-PROT574>3.0.CO;2-8.

    CAS  Article  PubMed  Google Scholar 

  74. 74.

    Tan C, Liu Z, Huang S, Li C, Ren J, Tang X, et al. Pectin methylesterase inhibitor (PMEI) family can be related to male sterility in Chinese cabbage (Brassica rapa ssp. pekinensis). Mol Gen Genomics. 2018;293(2):343–57. https://doi.org/10.1007/s00438-017-1391-4.

    CAS  Article  Google Scholar 

  75. 75.

    Câmara GA, Nishiyama-Jr MY, Kitano ES, Oliveira UC, da Silva PI, Junqueira-de-Azevedo IL, et al. A multiomics approach unravels new toxins with possible in silico antimicrobial, antiviral, and Antitumoral activities in the venom of Acanthoscurria rondoniae. Front Pharmacol. 2020;11:1075. https://doi.org/10.3389/fphar.2020.01075.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  76. 76.

    Pi B, Yu D, Dai F, Song X, Zhu C, Li H, et al. A Genomics Based Discovery of Secondary Metabolite Biosynthetic Gene Clusters in Aspergillus ustus. Andersen MR, editor. PLoS One. 2015;10:e0116089. https://doi.org/10.1371/journal.pone.0116089.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  77. 77.

    Palanisamy N. Identification of putative drug targets and annotation of unknown proteins in Tropheryma whipplei. Comput Biol Chem. 2018;76:130–8. https://doi.org/10.1016/j.compbiolchem.2018.05.024.

    CAS  Article  PubMed  Google Scholar 

  78. 78.

    Aherfi S, Andreani J, Baptiste E, Oumessoum A, Dornas FP, Andrade AC dos SP, et al. A Large Open Pangenome and a Small Core Genome for Giant Pandoraviruses Front Microbiol 2018;9: 1486. doi:https://doi.org/10.3389/fmicb.2018.01486.

  79. 79.

    Manivel G, Meyyazhagan A, Durairaj DR, Piramanayagam S. Genome-wide analysis of excretory/secretory proteins in Trypanosoma brucei brucei: insights into functional characteristics and identification of potential targets by immunoinformatics approach. Genomics. 2019;111(5):1124–33. https://doi.org/10.1016/j.ygeno.2018.07.007.

    CAS  Article  PubMed  Google Scholar 

  80. 80.

    Allioux M, Jebbar M, Slobodkina G, Slobodkin A, Moalic Y, Frolova A, et al. Complete genome sequence of Thermosulfurimonas marina SU872T, an anaerobic thermophilic chemolithoautotrophic bacterium isolated from a shallow marine hydrothermal vent. Mar Genomics. 2021;55:100800. https://doi.org/10.1016/j.margen.2020.100800.

    Article  PubMed  Google Scholar 

  81. 81.

    Bergk Pinto B, Maccario L, Dommergue A, Vogel TM, Larose C. Do organic substrates drive microbial community interactions in Arctic snow? Front Microbiol. 2019;10. https://doi.org/10.3389/fmicb.2019.02492.

  82. 82.

    Kiu R, Caim S, Alexander S, Pachori P, Hall LJ. Probing genomic aspects of the multi-host pathogen Clostridium perfringens reveals significant pangenome diversity, and a diverse array of virulence factors. Front Microbiol. 2017;8:2485. https://doi.org/10.3389/fmicb.2017.02485.

    Article  PubMed  PubMed Central  Google Scholar 

  83. 83.

    Neely CJ, Graham ED, Tully BJ. MetaSanity: an integrated microbial genome evaluation and annotation pipeline. Valencia a, editor. Bioinformatics. 2020;36(15):4341–4. https://doi.org/10.1093/bioinformatics/btaa512.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  84. 84.

    Zhao X, Bai S, Li L, Han X, Li J, Zhu Y, et al. Comparative transcriptome analysis of two Aegilops tauschii with contrasting drought tolerance by RNA-Seq. Int J Mol Sci. 2020;21(10):3595. https://doi.org/10.3390/ijms21103595.

    CAS  Article  PubMed Central  Google Scholar 

Download references

Acknowledgements

We are especially grateful to Dr. R. Sowdhamini (NCBS) and Dr. Sabarinathan Radhakrishnan (NCBS) for valuable suggestions on this work. We thank Anirudh KN and other members of the Ramesh Lab for feedback and suggestions.

Funding

We are grateful for funding and support from the DBT/Wellcome Trust-India Alliance (IA/I/14/2/501521), DST-SERB grant no. ECR/2016/001593 and the Human Frontier Science Program research grant RGY0077/2019 to A.R. We also acknowledge support from the Department of Atomic Energy, Government of India and the National Centre for Biological Sciences-TIFR, under project no. 12-R&D-TFR-5.04-0800. D.M. was supported by ICMR-SRF fellowship (No. ISRM/11(18)/2019. The funding bodies did not contribute to design of the study and collection, data analysis and interpretation and in writing the manuscript.

Author information

Affiliations

Authors

Contributions

D.M performed the computational analyses. D. M and A. R conceptualized the study and wrote the manuscript. Both authors have read and approved the manuscript.

Corresponding author

Correspondence to Arati Ramesh.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplementary Figures and Tables.

Data linked to Figs. 1-4 are given in Supplementary Figs. S1-S7, Supplementary Table S1, Supplementary Table S2 and Supplementary Table S3.

Additional file 2: Supplementary Table S4.

Supplementary Table S4 linked to data in Fig. 4 is given.

Additional file 3: Supplementary Table S5.

Supplementary Table S5 linked to data in Fig. 4 is given.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mehta, D., Ramesh, A. Diversity and prevalence of ANTAR RNAs across actinobacteria. BMC Microbiol 21, 159 (2021). https://doi.org/10.1186/s12866-021-02234-x

Download citation

Keywords

  • ANTAR protein
  • RNA regulatory system
  • Structured RNA
  • Actinobacteria