Transcriptomes of Frankia sp. strain CcI3 in growth transitions
BMC Microbiology volume 11, Article number: 192 (2011)
Frankia sp. strains are actinobacteria that form N2-fixing root nodules on angiosperms. Several reference genome sequences are available enabling transcriptome studies in Frankia sp. Genomes from Frankia sp. strains differ markedly in size, a consequence proposed to be associated with a high number of indigenous transposases, more than 200 of which are found in Frankia sp. strain CcI3 used in this study. Because Frankia exhibits a high degree of cell heterogeneity as a consequence of its mycelial growth pattern, its transcriptome is likely to be quite sensitive to culture age. This study focuses on the behavior of the Frankia sp. strain CcI3 transcriptome as a function of nitrogen source and culture age.
To study global transcription in Frankia sp. CcI3 grown under different conditions, complete transcriptomes were determined using high throughput RNA deep sequencing. Samples varied by time (five days vs. three days) and by culture conditions (NH4+ added vs. N2 fixing). Assembly of millions of reads revealed more diversity of gene expression between five-day and three-day old cultures than between three day old cultures differing in nitrogen sources. Heat map analysis organized genes into groups that were expressed or repressed under the various conditions compared to median expression values. Twenty-one SNPs common to all three transcriptome samples were detected indicating culture heterogeneity in this slow-growing organism. Significantly higher expression of transposase ORFs was found in the five-day and N2-fixing cultures, suggesting that N starvation and culture aging provide conditions for on-going genome modification. Transposases have previously been proposed to participate in the creating the large number of gene duplication or deletion in host strains. Subsequent RT-qPCR experiments confirmed predicted elevated transposase expression levels indicated by the mRNA-seq data.
The overall pattern of gene expression in aging cultures of CcI3 suggests significant cell heterogeneity even during normal growth on ammonia. The detection of abundant transcription of nif (nitrogen fixation) genes likely reflects the presence of anaerobic, N-depleted microsites in the growing mycelium of the culture, and the presence of significantly elevated transposase transcription during starvation indicates the continuing evolution of the Frankia sp. strain CcI3 genome, even in culture, especially under stressed conditions. These studies also sound a cautionary note when comparing the transcriptomes of Frankia grown in root nodules, where cell heterogeneity would be expected to be quite high.
Studies on actinorhizal symbioses have benefitted greatly from several genome sequences of the actinobacterial symbiont Frankia sp. strains. Such strains induce root nodules and fix N2 in a broad array of plants . The smallest frankial genome finished to date is that of Frankia sp. HFPCcI3 (CcI3) that infects plants of the family Casuarinaceae; it is about 5.4 Mbp in size and encodes 4499 CDS . A striking feature of the CcI3 genome is the presence of over 200 transposase genes or gene remnants that may play, or have played, a role in genome plasticity . In addition, relative to other Frankia sp. genomes that have been sequenced, CcI3 contains few gene duplicates . Comparative genome studies suggest that evolution has favored gene deletion rather than duplication in this strain, perhaps as an outcome of its symbiotic focus on a single, geographically limited group of plants in the Casuarinaceae .
Transcriptome sequencing of bacterial genomes has yielded surprising complexity (for a review see ). Such studies have shown differential cistron transcription within operons , small regulatory RNA transcripts [6–9] and numerous riboswitch controlled transcripts [10, 11]. Significant transcriptional heterogeneity has also been found in single cultures that has been ascribed to subpopulations within an otherwise synchronized bacterial population . High throughput RNA-seq methods provide a tool for transcript quantification with a much higher dynamic range than that provided by microarray studies by relying on direct comparison of transcript abundance for assessing differential expression .
Frankia transcriptome studies have the potential to reveal common genes and pathways active in, or essential to, symbiosis and free-living growth. A first step to resolving symbiotic-specific expression is to gain insight into transcriptional behavior and variability in axenic culture. This work helps address the issue of cultural heterogeneity that will likely be exacerbated by physiological heterogeneity in symbiosis. A previous transcriptome study has been done using whole-genome microarrays in Alnus and Myrica root nodules using cultured Frankia alni strain ACN14a as a reference . In that study, relatively few surprises were encountered and the overall transcription profile was similar in both nodule types. We focus here on an approach using transcriptome deep sequencing of cultured Frankia strain CcI3 grown under different conditions, and the analysis of subsequent data to provide insight into the global expression that may impinge on physiology and genome stability in Frankia strains.
Results and Discussion
Culture characteristics and experimental design
As a consequence of its filamentous growth habit, Frankia sp. strain CcI3 grows from hyphal tips with an initial doubling time of about 18 hrs that subsequently slows to more linear growth . As tips extend, cells left behind are physiologically in stationary phase and eventually senesce. Thus, even young cultures (defined here as three days old) have a degree of physiological heterogeneity that increases as cultures age . This heterogeneity must be taken into account in interpreting global transcriptome analyses.
Several factors in our sampling and library creation may influence a transcriptome analysis. Single Frankia cultures were used in preparing RNA libraries for each sample prior to sequencing. In addition, each sample was run on the Illumina GA IIx sequencer without technical replicates. While technical and biological replicates would have eliminated two potential sources of variability in the results of this experiment, several studies have suggested that both types of variability are unlikely to influence end results [13, 17], while other studies have found significant variation among replicate samples [18, 19]. Such effects may only influence low RPKM value genes  but, as with many such studies, our results must be viewed in the light of many potential variables.
RNA sample quality and features
RNA preparations used for making dscDNA libraries for Illumina sequencing had 260/280 ratios greater than 2.0 and greater than 400 to 950 ng per μl. PCR amplification using primers for the glnA gene failed to yield an amplicon from RNA preparations indicating very low, if any, DNA contamination. In addition, an RT-PCR assay revealed no detectable DNA within total RNA samples prepared in a separate experiment, confirming that the RNA extraction technique can apply to sensitive RNA based experiments that use strain CcI3.
Transcriptome sequencing done using 5dNH4 CcI3 cells yielded about six million reads, three million of which could be mapped to the Frankia sp. CcI3 genome (Table 1). Almost 51% of the mapped reads were from rRNA or tRNA (Table 1). An updated base-calling algorithm (RTA v. 1.6) yielded substantially higher reads for samples from 3dNH4 and 3dN2 cultures. About 26 million reads were obtained for the latter samples, with about 16 million mapped reads in each (Table 1). Non-coding RNAs represented a greater proportion of mapped reads in these two samples, comprising nearly 80% of the total.
Even after ribosomal RNA depletion, non-coding sequences formed the majority of reads in all samples with the greatest reduction seen in the 5dNH4 sample (Table 1). This relative amount of rRNA could be related to the reduction of rRNA in older cultures, as observed in stationary and death phase cultures of E. coli . On the other hand, given the concentration dependence of the rRNA depletion method used in preparing the mRNA-seq libraries, a decrease in the proportion of rRNA in the five-day time point could have resulted from more efficient depletion. Incomplete depletion of rRNA populations is similar to what is observed in other studies and is related to the sheer abundance of such sequences .
The number of coding RNA reads was similar among all three samples although the read length for the 3dNH4 and 3dN2 samples was 76 versus 34 for 5dNH4. All of the pseudogenes present in the CcI3 genome had transcripts in at least two of the three genomes (Table 1). Pseudogene transcription is presently not believed be a rare event , though many pseudogenes identified in a bacterial genome may simply be misannotated ORFS.
The 100 genes with the highest RPKM value in each condition, omitting ribosomal RNAs, are listed in Table 2. The number of hypothetical genes in this group range from 29 in the 3dNH4 cells to 39 in the 3dN2 cells to 43 in the 5dNH4 cells. Older cultures had more transcripts associated with tRNAs, transposases, CRISPR elements, integrases and hypothetical proteins than did younger cultures. Indeed, had they been included in the list, 18 of the 46 tRNA genes in CcI3 would have been in the top 100 most abundant transcript populations in 5dNH4 cells whereas no tRNAs were found in the top 100 transcripts in 3dN2 or 3dNH4 cell populations. The picture painted by the abundance of such transcripts is one of cells starved for essential metabolites such as amino acids, as expected in aging cells. In addition, enzymes involved in solving oxidative damage (e.g. protein-methionine-S-oxide reductase) were also more abundant in the older culture. Conversely, enzymes involved in catabolism (eg. alcohol dehydrogenase) were more frequently represented in the two younger cultures.
Comparison of the top 100 gene lists with each other (color coded in Table 2) and construction of heat maps of all genes revealed that overall gene expression varied more with culture age (three versus five days) than culture condition (+/- NH4+), with 3dNH4 and 3dN2 clustering before the 5dNH4 sample (Figure 1). Gene dendrograms (left side of the figure) gave five clusters of genes (Groups I through V) that had within-group expression profiles consistent among the three culture conditions tested. The genes in each cluster are listed in Additional File 1: Gene_list.xls.
Group I genes are clearly down-regulated in 3dNH4 cells; these include 30 transporter related genes, five diguanylate cyclases and an array of putative N-controlled proteins such as assimilatory nitrate reductase, adenosine deaminase, allantoinase and nitrogen fixation (nif) genes in addition to 252 hypothetical proteins. Group II genes are up regulated in 3dN2 cultures and include most of the nif genes, genes involved in sulfur metabolism and iron-sulfur protein synthesis, cell division proteins and hydrogenase synthesis. The 3dN2 culture was prepared with a modified iron stock containing a higher concentration of iron sulphate and sodium molybdate . We cannot rule out that an increase in iron-sulfur protein synthesis may be related to the increase in iron sulphate to the medium although it is more likely to be related to an increased demand for iron and molybdenum. Eight phage integrases were also present in Group II, which was the highest number of integrases present in any of the five groups. Group III contains genes that have relatively more transcripts in 5dNH4 cells; these include a larger proportion of hypothetical protein ORFs (523 ORFs) than were present in the other four groups (average of ~200 ORFs per group). All of the annotated excisionase/Xis ORFs were present in the Group III list, suggesting that phage-related excisionases are being transcribed more in the 5dNH4 sample than in the other conditions. Group IV genes were more abundantly transcribed in the 3dNH4+ sample including several sigma factors; this group also had the fewest transposase ORFS (2 ORFs). Group V contains ORFs more highly expressed in younger cultures. ORFs in this grouping include 17 ribosomal protein ORFs, and a majority of the glycolytic enzymes.
As expected, nif ORFs were more highly expressed in the 3dN2 sample, with numerous vesicles present, than in the 3dNH4 sample and were in Group II on the heat map. The 5dNH4 culture also had nif expression above that detected in the 3dNH4 culture. Three nif ORFs were not significantly expressed in the 5dNH4 sample over the 3dNH4 sample as predicted by a Kal's ztest p value  (Table 3). On the other hand, the genes for the core nitrogenase components nitrogenase reductase (nifH), and nitrogenase alpha and beta chains (nifKD) were upregulated in the 3dN2 sample, and were cotranscribed to similar extents within individual cultures, suggesting that they exist in an operon independent from the rest of the nif cluster. An intergenic space consisting of 208 nucleotides between these three ORFs and the rest of the cluster supports this analysis. The presence of nif transcripts in all cell types, even where ammonia should still be in excess, is in concert with the heterogeneous nature of the frankial growth habit, where mycelia develop microsites that are potentially nutrient deficient or microaerobic due to adjoining cell populations. The 5dNH4 cells are most likely depleted for combined nitrogen and, indeed, a few vesicles can be observed in older cultures. This observation highlights a fundamental problem with the mRNA deep sequencing of a Frankia culture where different cell physiologies can skew average gene expression in a culture. Apart from isolated vesicles  that are unlikely to give a sufficient quantity of mRNA for second generation sequencing technologies, long-read, single molecule sequencing techniques run in parallel could specifically sequence the transcriptome of distinct cell morphologies in a pure culture as was recently done with Vibrio cholerae .
Recent studies on Frankia proteomes have indicated the presence of several transposases in CcI3 grown in culture and in symbiosis , raising the question of how IS elements behave in cultured CcI3 cells. Given the number of transposase ORFs in the CcI3 genome (148 complete plus 53 fragments identified by PSI-BLAST analysis ), mRNA deep sequencing provides an efficient method of quantifying their behavior in cultures grown under different conditions.
RPKM values for the transposase ORFs were plotted against the locations of IS elements in strain CcI3 (Figure 2; ). Additional files 2, 3, 4, 5, 6 and 7 list the calculated expression data for the transposase ORFs. Transposase transcripts were generally more abundant than the transcriptome's median RPKM value (dashed line; values respective of sample) throughout the genome. The visual representation of transcript abundance in Figure 2 indicates that transposase ORFs were overall more highly expressed in older cultures and, to a lesser extent, in N2 fixing cells than in younger, nutrient sufficient cultures. Seventy-three transposase ORFs in the 5dNH4 sample were more highly expressed with respect to the 3dNH4 sample (Figure 2; Additional file 8: SNP_call_list.xls). Only 29 transposase ORFs were shown statistically to have higher expression in 3dNH4 than in 5dNH4. A similar trend was noticed in the 3dN2 vs 3dNH4 sample, with 91 transposase ORFs having statistically significant higher expression values in the 3dN2 sample. Many transposase ORFs had similar expression in the 3dN2 vs 3dNH4 and the 5dNH4 vs 3dNH4 comparisons. This is reflected in the ztest p values, as the 3dN2 vs 3dNH4 comparison had 50 changes with p values greater than 0.05 and the 5dNH4 versus 3dNH4 comparison had 48 changes with p values greater than 0.05. The majority of the insignificant p values in the comparisons are due to similarity of RPKM values.
One IS66 transposase (Locus tag: Francci3_1864) near the 2 Mb region of the genome had an RPKM greater than 1600 in all samples. The majority of these reads were ambiguous. This transposase has five paralogs with greater than 99% nucleotide similarity, thereby accounting for ambiguous reads, so the elevated RPKM, while still high, is distributed among several paralogs. Other transposase ORFs with RPMK values higher than the median were more likely to be present in CcI3 deletion windows (gray boxes ) as determined by a Chi Square test against the likelihood that high RPKM transposase ORFs would exist in a similar sized region of the genome at random (p value = 1.32 × 10-7). This observation suggests that any transposase found in these windows is more likely to be transcribed at higher levels than transposases outside of these regions.
The largest change in expression was found in an IS3/IS911 ORF between the 5dNH4 and 3dNH4 samples. This ORF (locus tag: Francci3_1726, near 1.12 Mb) was expressed eleven fold higher in the 5dNH4 sample than in the 3dNH4 sample. Five other IS66 ORFs are also highly expressed in 5dNH4 ranging from 4 fold to 5 fold higher expression than in the 3dNH4 sample. Eight IS4 transposases had no detected reads under the alignment conditions in each growth condition. These eight IS4 transposases are members of a previously described group of 14 paralogs that have nearly 99% similarity in nucleic acid sequence . Parameters of the sequence alignment used allowed for ten sites of ambiguity, therefore discarding reads from eight of these 14 duplicates as too ambiguous to map on the reference genome. Graphic depictions of assembled reads derived from raw CLC workbench files show that the majority of reads for the six detected IS4 transposases mapped around two regions. Both of these regions contained one nucleotide difference from the other eight identical transposases. De novo alignment of the unmapped reads from each sample resulted in a full map of the highly duplicated IS4 transposase ORFs (data not shown).
More globally, the 5dNH4 and 3dN2 samples had higher RPKM values per transposase ORF than in the 3dNH4 sample. The sum of the RPKM values among the transposase data set placed the 5dNH4 sample (34350 sum RPKM) and the 3dN2 (36150 sum RPKM) each nearly 30% higher than in 3dNH4 (26916 sum RPKM). The numbers of transposase genes classified as upregulated in the heat maps in Figure 1 include 44 in 3dN2 cells, 40 in 5dNH4 cells and only two in 3dNH4 cells. Twenty-eight were down regulated in the 3dNH4 cells as shown by the heat map analysis (Additional File 8: SNP_call_list.xls). These results suggest a relative quiescence of transposase ORFs during healthy growth, and a burst of transcription when cells are stressed. Mutagenesis of genes involved in general metabolic pathways in Escherichia coli has been shown to promote earlier transposition of an IS5 family insertion sequence . Media supplements to the mutated cells were shown to delay transposition events, thereby showing general starvation responses were likely involved in increased IS element activity .
The expression of nif cluster genes in the 5dNH4 sample suggests that the ammonium content of the medium was depleted, or nutrient deprived microsites had developed among the mycelia. One of the highly expressed non-ribosomal ORFs is the pyrophosphohydrolase gene hisE (Francci3_4317), suggesting that the amino acid histidine is in short supply. Additionally, a serine O-acetyltransferase was highly expressed in 5dNH4 cells, indicating activity in the cysteine synthesis pathway. Higher expression of both ppx/gppA ORFs (Locus tags: Francci3_0472 and Francci3_3920) in the 5dNH4 sample suggests that the stringent response  is active in response to amino acid deprivation. Two ORFs annotated as (p)ppGpp synthetases (Locus tags: Francci3_1376 and Francci3_1377) were actually more highly expressed in 3dN2 and 3dNH4 cells than in 5dNH4 cells.
Transcription of IS elements does not directly correlate to translation . Many IS elements prevent their own transposition by requiring a -1 frame shift mutation in the transcript in order to express a functional transposase protein . Since the specific methods of translational control used by Frankia IS elements are unknown, transcriptome data alone cannot be used as a proportional metric for transposition activity. On the other hand, recent proteomic studies on the CcI3 genome have confirmed that translation of many IS elements does occur in vivo and in symbiosis [16, 33].
RT-qPCR confirmation of transposase transcription
Duplicated copies of highly similar transposase ORFs presented a problem in the analysis of transcript sequence data. To compare transcription frequencies of duplicated ORFs in different culture conditions, we used RT-qPCR to amplify conserved regions of eight duplicated transposase ORF families using primers designed to amplify conserved regions in each group. The duplicates had greater than 98% nucleotide similarity with each other. The glutamine synthetase I (glnA) gene was used to normalize expression data as previously described . We included a five-day old nitrogen fixing (5dN2) condition in our assay to better estimate transposase ORF expression in two older culture conditions (5dN2 and 5dNH4).
The results of the RT-qPCR assay confirmed the transcriptome sequence data (Figure 3). Comparing the five-day samples with three-day samples revealed an increase in transposase ORF transcription in older cultures in nearly all cases (Figure 3a). The only exception was in the case of the Tn3 family of transposases where transcription was predicted to be higher (fold change values less than one) at three days in both conditions. This may be due to transposition immunity described for other members of the Tn3 family . Cross comparisons of NH4 and N2 samples revealed that nitrogen fixing cultures had more transposase transcripts from these duplicated families than from the ammonium cultures at both time points (Figures 3b and 3c). The most dramatic change in transcript quantity was found for the IS4 transposases' transcripts in the 5dN2 sample that were 7.4 fold higher than levels in the 3dNH4 sample. As the representative transposase ORFs chosen for the RT-qPCR analysis were families of duplicates, a direct comparison of RT-qPCR fold change to transcriptome RPKM values was difficult to make. Still, the results of this experiment confirm the general trend of transposase ORF transcription in Frankia sp. CcI3: older and nitrogen-deprived cultures had higher transcription of transposase ORFs.
Prophage and CRISPRs
ORFs with phage-related annotations were all more highly transcribed in the five-day sample with respect to both three-day samples (Table 4). Several ORFs annotated as phage integrases were expressed more than two-fold in the 5dNH4 sample when compared to the 3dNH4 sample. Comparisons of fold change among all three samples yielded many statistically insignificant differences as determined by a Kal's z-test suggesting that these ORFs are likely transcribed at similar rates regardless of culture conditions. A phage SPO1 DNA polymerase-related protein (Francci3_0075) was constitutively expressed in all three samples, and four phage resistance ORFs were up-regulated in the 5dNH4 sample. The latter include members of the pspA and pgl (Phi C31) families of phage resistance genes. Similar RPKM values between the two pgl ORFs in all three samples suggest that these ORFs are transcribed as an operon in CcI3.
CcI3 has four putative CRISPR arrays, two of which are located near clusters of CAS ORFs (data obtained from CRISPRFinder ). Three of the CRISPR arrays had high numbers of repeat copies (38, 15 and 20 spacers per array ordered with respect to the OriC) making alignment of ambiguous sequence reads difficult. Even the shorter 36 bp read lengths of the 5dNH4 sample could not be reliably mapped across the arrays using the CLC Genome Workshop alignment programs. As a result, few reads mapped to the array region of the CRISPR islands and numerous deletions were predicted (Additional Files 2 through 7). The CAS ORF transcripts, by contrast, were detected in all three samples. Again, transcription was modestly higher in the 5dNH4 sample than in the 3dNH4 sample (Table 5). In this instance, the 3dN2 sample had nearly two fold higher expression of all CAS ORFs when compared with the 3dNH4 sample. Comparison of the 5dNH4 and 3dN2 samples revealed insignificant fold changes as determined by a Kal's ztest.
Given the base pair resolution of RNA sequencing, it is possible to identify single nucleotide polymorphisms (SNPs). Recent analysis of the bovine milk transcriptome revealed high fidelity of SNP calls derived from an RNA-seq experiment, though the authors caution that stringent criteria are necessary to reduce false positive calls . Using similar filtering criteria, we identified 215 SNPs in the 5dNH4 sample, 365 SNPs in the 3dN2 sample and 350 SNPs in the 3dNH4 sample. Comparison of the SNP populations revealed that the 5dNH4 sample had substantially different SNP calls than the 3dN2 and 3dNH4 samples. Only 21 of the putative SNPs were found in all three samples (Table 6). Twelve of these common SNPs resulted in non-synonymous amino acid changes.
There are several possibilities that may explain the variance of SNP content between the 5dNH4 sample and the two three day samples. The age of the culture is a possible, yet unlikely, contributor to a significantly different SNP pattern. Frankia strains are maintained by bulk transfer of cells since derivation from single colonies is problematical due to the hyphal habit of growth. Thus, over time, SNPs likely arise spontaneously. Another possibility is that errors are incorporated into the mRNA-seq libraries resulting in false positive SNPs. The Superscript III© reverse transcriptase used in the first strand cDNA synthesis was derived from a MML virus  and has an error rate of approximately 3.0 × 10-5 errors per base . Therefore, only SNPs detected in all three samples with high coverage and multiple variant copies were likely true positive SNPs.
We deep-sequenced dscDNA libraries derived from three culture conditions of Frankia sp. CcI3. Overall gene expression varied more as a function of culture age than as a function of nitrogen deprivation, likely because the cell population has fewer actively growing cells at the fifth day of culture and those remaining are adapting to nutrient deprivation. In two limited nutrient environments, transposase ORFs were relatively more highly expressed than in younger ammonium grown cells. A RT-qPCR assay designed to quantify highly duplicated transposase ORFs supported the data from the mRNA-seq experiment. These results, in tandem with discovery of putative SNPs, suggests that the IS element laden CcI3 genome is in constant flux within the relatively mundane conditions of a culture flask.
Culture media and conditions
Frozen stocks of Frankia sp. strain CcI3, were suspended in duplicate in 200 ml of Frankia Defined Minimal media (FDM) containing 45 mM sodium pyruvate and 9.3 mM ammonium chloride in 500 ml flasks . Cells were grown at 30°C for three or five days on FDM with or without (N2 fixing cells) ammonium. Nitrogen fixing cultures were prepared using a modified iron stock as previously described . Given the difficulty in quantifying viable Frankia cells in culture, a total of three ml of gravity-settled cells were harvested per culture flask for RNA extraction.
Frankia cells were processed using a ZR Fungal/Bacterial RNA MiniPrep™ kit from Zymo Research© (http://www.zymoresearch.com) using the manufacturer's recommendations. To completely remove genomic DNA (gDNA) contamination from the RNA extraction, we performed the in-column DNAse I optional step using Amplification grade DNAse I (Invitrogen™, http://www.invitrogen.com). DNAseI incubation times were extended to 30 minutes at 37°C in order to completely remove gDNA from the sample. A final elution volume of 15 μl of RNAse free water was used instead of the recommended 6 μl elution volume. Only RNA samples with a 260/280 nm wavelength ratio above 2.00 were used for library construction and RT-qPCR assays.
In order to enrich mRNA content for generating a cDNA library, we used the MICROBExpress™ Bacterial mRNA Enrichment Kit (Ambion Inc., http://www.ambion.com). The manufacturer's website specifies that the oligonucleotide sequence used by the kit should anneal to the 16S and 23S rRNA sequences of many eubacterial species including Frankia sp. Approximately 10 μg of Frankia total RNA in each condition was processed using the kit per the manufacturer's instructions. This procedure yielded 2 - 3.75 μg of RNA after depletion for each sample. Subsequent gel analysis and sequencing data revealed substantial 16S and 23S rRNA within the sample, suggesting only partial depletion of rRNA transcripts. Samples were nonetheless prepared using the depletion kit in order to minimize variability due to differential handling in the experiment.
Complementary DNA library generation
One microgram of processed Frankia RNA was used in an Illumina mRNA-seq kit. The poly-dT pulldown of polyadenylated transcripts was omitted, and the protocol was followed beginning with the mRNA fragmentation step. A SuperscriptIII© reverse transcriptase was used instead of the recommended SuperscriptII© reverse transcriptase (Invitrogen™). This substitution was made in light of the higher G+C% of Frankia sp. transcripts (71% mol G+C) and the ability of the SuperscriptIII© transcriptase to function at temperatures greater than 45°C. Because of this substitution, the first strand cDNA synthesis stage of the protocol could be conducted at 50°C instead of 42°C. Since a second-strand cDNA synthesis was performed, the cDNA library was agnostic with respect to the strandedness of the initial mRNA. The final library volumes were 30 μl at concentrations of 40 - 80 ng/μl as determined by Nanodrop spectrophotometer.
Library clustering and Illumina platform sequencing
Prior to cluster generation, cDNA libraries were analyzed using an Agilent© 2100 Bioanalyzer (http://www.chem.agilent.com) to determine final fragment size and sample concentration. The peak fragment size was determined to be approximately 200 +/- 25 bp in length for each sample. Twenty nmoles of each cDNA library were prepared using a cluster generation kit provided by Illumina Inc. The single-read cluster generation protocol was followed. Final cluster concentrations were estimated at 100,000 clusters per tile for the five day sample and 250,000 clusters per tile for the two three day samples on each respective lane of the sequencing flow-cell.
An Illumina® Genome Analyzer IIx™ was used in tandem with reagents from the SBS Sequencing kit v. 3 in order to sequence the cDNA clusters. A single end, 35 bp internal primer sequencing run was performed as per instructions provided by Illumina®. Raw sequence data was internally processed into FASTQ format files which were then assembled against the Frankia sp. CcI3 genome [Genbank: CP000249] using the CLC Genomics Workbench™ software package distributed by CLC Bio©.
Frankia sp. CcI3 has a several gene duplicates. This made the alignment of the short reads corresponding to the gene duplicates difficult. Reads could only be mapped to highly duplicated ORFs by setting alignment conditions to allow for 10 ambiguous map sites for each read. In the case of a best hit "tie," an ambiguous read was mapped to a duplicated location at random. Without this setting, more than 20 ORFs would not have been detected by the alignment program simply due to nucleotide sequence similarity.
To standardize gene expression calculations among different samples, the CLC Genomic Workbench software calculates an expression value termed "reads per kilobase million" (RPKM). This calculation incorporates variable gene length in the gene expression ratio, and the total number of reads obtained from a sequencing run . The equation used to determine RPKM values is as follows:
The RPKM value allows comparisons between datasets containing variable numbers of reads as well as expression of genes with varying lengths. Because of the disparate quantities of rRNA reads among the three samples, we removed all non-coding RNA (ncRNA) reads from the data set before calculating RPKM values. This ensures that the reads from the 5dNH4 sample, which had the lowest number of ncRNA reads, were not overrepresented. Comparisons of gene expression were tested using Kal's Z-test . Heat maps were generated using the Cluster 3.0 command line program (http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/software.htm). Datasets were normalized and median subtracted prior to map generation. Maps were viewed using Java Treeview .
Potential SNPs were filtered using the following criteria: (1) reads containing putative SNPs were discarded if they had an average quality score of less than 15; (2) the polymorphic base within the read had to have a quality score above 20; (3) at least 10× coverage of the SNP position was required; (4) the SNP had to be present in 25% of the reads at that location. Raw sequence reads and calculated RPKM values for each CcI3 ORF were uploaded to the Gene Expression Omnibus database at NCBI (http://www.ncbi.nlm.nih.gov/projects/geo) with the accession number GSE30680.
The nucleotide sequences for the target transposase ORFs in Frankia strain CcI3 [genbank: CP000249] were retrieved from Genbank. Primers were designed using the Primer3 webtool (http://frodo.wi.mit.edu/primer3/) with settings to generate primers with a melting temperature of ~60°C. Due to the limitations of extension time in quantitative polymerase chain reactions (qPCR), primers were designed to amplify less than 200 bp of sequence when possible.
Stocks of Frankia sp. CcI3 cells were grown in four culture conditions that included two time points and two medium types. Three of the conditions mirrored those used in the mRNA-seq experiment (3dN2, 3dNH4 and 5dNH4). A fourth condition, consisting of cells grown in nitrogen fixing medium for five days (5dN2), was also used. Cells were harvested and RNA was purified in the same manner as used in the mRNA-seq experiment. Approximately one micro-gram of RNA from each sample was used in subsequent reverse transcriptase reactions. Complementary DNA was synthesized using the SuperscriptIII© reverse transcriptase with gene specific primers (~100 nM final concentrations per reaction mix). Synthesis of the first strand was carried out at 55°C for 50 minutes with a five minute denature step at 80°C. RT reactions were diluted ten-fold with sterile water after denaturation.
All qPCR experiments were performed using the Bio-Rad™ SsoFast© Evagreen qPCR 2X master mix. Reaction volumes were reduced to 12.5 μl. A Bio-Rad™ iQ5 real-time thermocycler was used to quantify reactions. Antibody denaturing of the SsoFast polymerase was performed at 95°C for 1.5 minutes immediately prior to any cycling step. This was followed by one 98°C denaturation for 2 minutes. Temperature cycling consisted of the following: 35 cycles of 98°C for 10 seconds then 55°C for 15 seconds and finally 65°C for 15 seconds. Melt curves (to determine if there were multiple PCR amplicons) were constructed by heating final amplified reactions from 65°C to 95°C for 10 seconds in single degree stepwise fashion. Primer efficiencies were calculated from readings derived from a standard curve of known DNA concentrations. Relative expression levels of target genes were calculated using the Pfaffl standardization as previously described . The glutamine synthetase I gene (glnA) was used as a reference gene to standardize relative expression in the four samples.
Normand P, Queiroux C, Tisa LS, Benson DR, Rouy Z, Cruveiller S, Medigue C: Exploring the genomes of Frankia. Physiologia Plantarum. 2007, 130: 331-343. 10.1111/j.1399-3054.2007.00918.x.
Normand P, Lapierre P, Tisa LS, Gogarten JP, Alloisio N, Bagnarol E, Bassi CA, Berry AM, Bickhart DM, Choisne N, et al: Genome characteristics of facultatively symbiotic Frankia sp. strains reflect host range and host plant biogeography. Genome Res. 2007, 17 (1): 7-15.
Bickhart D, Gogarten J, Lapierre P, Tisa L, Normand P, Benson D: Insertion sequence content reflects genome plasticity in strains of the root nodule actinobacterium Frankia. BMC Genomics. 2009, 10 (1): 468-10.1186/1471-2164-10-468.
Sorek R, Cossart P: Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity. Nat Rev Genet. 2010, 11 (1): 9-16.
Guell M, van Noort V, Yus E, Chen WH, Leigh-Bell J, Michalodimitrakis K, Yamada T, Arumugam M, Doerks T, Kuhner S, et al: Transcriptome complexity in a genome-reduced bacterium. Science. 2009, 326 (5957): 1268-1271. 10.1126/science.1176951.
Altuvia S: Identification of bacterial small non-coding RNAs: experimental approaches. Current Opinion in Microbiology. 2007, 10 (3): 257-261. 10.1016/j.mib.2007.05.003.
Bejerano-Sagie M, Xavier KB: The role of small RNAs in quorum sensing. Curr Opin Microbiol. 2007, 10: 189-198. 10.1016/j.mib.2007.03.009.
Livny J, Waldor MK: Identification of small RNAs in diverse bacterial species. Curr Opin Microbiol. 2007, 10: 96-101. 10.1016/j.mib.2007.03.005.
Shi Y, Tyson GW, DeLong EF: Metatranscriptomics reveals unique microbial small RNAs in the ocean's water column. Nature. 2009, 459: 266-269. 10.1038/nature08055.
Mandal M, Boese B, Barrick JE, Winkler WC, Breaker RR: Riboswitches control fundamental biochemical pathways in Bacillus subtilis and other bacteria. Cell. 2003, 113: 577-586. 10.1016/S0092-8674(03)00391-X.
Loh E: A trans-acting riboswitch controls expression of the virulence regulator PrfA in Listeria monocytogenes. Cell. 2009, 139: 770-779. 10.1016/j.cell.2009.08.046.
Passalacqua KD, Varadarajan A, Ondov BD, Okou DT, Zwick ME, Bergman NH: Structure and Complexity of a Bacterial Transcriptome. J Bacteriol. 2009, 191 (10): 3203-3211. 10.1128/JB.00122-09.
Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: An assessment of technical reproducibility and comparison with gene expression arrays. Genome Research. 2008, 18 (9): 1509-1517. 10.1101/gr.079558.108.
Alloisio N, Queiroux C, Fournier P, Pujic P, Normand P, Vallenet D, Medigue C, Yamaura M, Kakoi K, Kucho K-i: The Frankia alni Symbiotic Transcriptome. Molecular Plant-Microbe Interactions. 2010, 23 (5): 593-607. 10.1094/MPMI-23-5-0593.
Benson DR, Schultz NA: Physiology and biochemistry of Frankia in culture. The biology of Frankia and actinorhizal plants. Edited by: Schwintzer CR, Tjepkema JD. 1989, Orlando: Academic Press, 107-127.
Mastronunzio JE, Huang Y, Benson DR: Diminished Exoproteome of Frankia spp. in Culture and Symbiosis. Appl Environ Microbiol. 2009, 75 (21): 6721-6728. 10.1128/AEM.01559-09.
Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008, 320 (5881): 1344-1349. 10.1126/science.1158441.
Willenbrock H, Salomon J, Sokilde R, Barken KB, Hansen TN, Nielsen FC, Moller S, Litman T: Quantitative miRNA expression analysis: comparing microarrays with next-generation sequencing. RNA. 2009, 15 (11): 2028-2034. 10.1261/rna.1699809.
Bullard JH, Purdom E, Hansen KD, Dudoit S: Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments. BMC Bioinformatics. 2010, 11: 94-10.1186/1471-2105-11-94.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.
Wada A, Mikkola R, Kurland CG, Ishihama A: Growth Phase-Coupled Changes of the Ribosome Profile in Natural Isolates and Laboratory Strains of Escherichia coli. J Bacteriol. 2000, 182 (10): 2893-2899. 10.1128/JB.182.10.2893-2899.2000.
Stewart FJ, Ottesen EA, DeLong EF: Development and quantitative analyses of a universal rRNA-subtraction protocol for microbial metatranscriptomics. ISME J. 2010
Zheng D, Frankish A, Baertsch R, Kapranov P, Reymond A, Choo SW, Lu Y, Denoeud F, Antonarakis SE, Snyder M, et al: Pseudogenes in the ENCODE regions: Consensus annotation, analysis of transcription, and evolution. Genome Research. 2007, 17 (6): 839-851. 10.1101/gr.5586307.
Noridge NA, Benson DR: Isolation and nitrogen-fixing activity of Frankia sp. strain CpI1 vesicles. J Bacteriol. 1986, 166 (1): 301-305.
Kal AJ, van Zonneveld AJ, Benes V, van den Berg M, Koerkamp MG, Albermann K, Strack N, Ruijter JM, Richter A, Dujon B, et al: Dynamics of Gene Expression Revealed by Comparison of Serial Analysis of Gene Expression Transcript Profiles from Yeast Grown on Two Different Carbon Sources. Mol Biol Cell. 1999, 10 (6): 1859-1872.
Tisa LS, Ensign JC: Isolation and nitrogenase activity of vesicles from Frankia sp. strain EAN1pec. Journal of Bacteriology. 1987, 169 (11): 5054-5059.
Chin CS, Sorenson J, Harris JB, Robins WP, Charles RC, Jean-Charles RR, Bullard J, Webster DR, Kasarskis A, Peluso P, et al: The origin of the Haitian cholera outbreak strain. N Engl J Med. 2010, 364 (1): 33-42.
Mastronunzio JE: Genomic and Proteomic Analyses of Extracellular and Symbiosis-related Proteins in Frankia. 2009, Storrs, CT: University of Connecticut
Twiss E, Coros AM, Tavakoli NP, Derbyshire KM: Transposition is modulated by a diverse set of host factors in Escherichia coli and is stimulated by nutritional stress. Molecular Microbiology. 2005, 57 (6): 1593-1607. 10.1111/j.1365-2958.2005.04794.x.
Kristensen O, Ross B, Gajhede M: Structure of the PPX/GPPA Phosphatase from Aquifex aeolicus in Complex with the Alarmone ppGpp. Journal of Molecular Biology. 2008, 375 (5): 1469-1476. 10.1016/j.jmb.2007.11.073.
Chandler M, Fayet O: Translational frameshifting in the control of transposition in bacteria. Mol Microbiol. 1993, 7 (4): 497-503. 10.1111/j.1365-2958.1993.tb01140.x.
Sekine Y, Eisaki N, Ohtsubo E: Translational control in production of transposase and in transposition of insertion sequence IS3. J Mol Biol. 1994, 235 (5): 1406-1420. 10.1006/jmbi.1994.1097.
Mastronunzio J, Benson D: Wild nodules can be broken: proteomics of Frankia in field-collected root nodules. Symbiosis. 2010
Pfaffl MW: A new mathematical model for relative quantification in real-time RT-PCR. Nucl Acids Res. 2001, 29: 2002-2007.
Maekawa T, Yanagihara K, Ohtsubo E: A cell-free system of Tn3 transposition and transposition immunity. Genes to Cells: Devoted to Molecular & Cellular Mechanisms. 1996, 1 (11): 1007-1016.
Grissa I, Vergnaud G, Pourcel C: CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats. Nucl Acids Res. 2007, gkm360-gkm360.
Cánovas A, Rincon G, Islas-Trejo A, Wickramasinghe S, Medrano J: SNP discovery in the bovine milk transcriptome using RNA-Seq technology. Mammalian Genome. 2010, 21 (11): 592-598. 10.1007/s00335-010-9297-z.
Kotewicz ML, D'Alessio JM, Driftmier KM, Blodgett KP, Gerard GF: Cloning and overexpression of Moloney murine leukemia virus reverse transcriptase in Escherichia coli. Gene. 1985, 35 (3): 249-258. 10.1016/0378-1119(85)90003-4.
Arezi B, Hogrefe HH: Escherichia coli DNA polymerase III [epsilon] subunit increases Moloney murine leukemia virus reverse transcriptase fidelity and accuracy of RT-PCR procedures. Analytical Biochemistry. 2007, 360 (1): 84-91. 10.1016/j.ab.2006.10.009.
Bassi CA, Benson DR: Growth characteristics of the slow-growing actinobacterium Frankia sp. strain CcI3 on solid media. Physiologia Plantarum. 2007, 130 (3): 391-399. 10.1111/j.1399-3054.2007.00866.x.
Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008, 5 (7): 621-628. 10.1038/nmeth.1226.
Saldanha AJ: Java Treeview--extensible visualization of microarray data. Bioinformatics. 2004, 20 (17): 3246-3248. 10.1093/bioinformatics/bth349.
We thank Elaine Hager of the University of Connecticut Health Center Translational Genomics Core facility for help with the Illumina platform and Juliana Mastronunzio for helpful discussions. We also thank Dr. Joerg Graf of the University of Connecticut for use of the CLC Genomic Workbench software. This work was supported by grant no. EF-0333173 from the National Science Foundation Microbial Genome sequencing program to D.R.B. and by the University of Connecticut Research Foundation. The authors declare that they have no competing interests.
DMB created the RNA-seq libraries. DMB and DRB planned the experiments, analyzed the data and wrote the manuscript. Both authors have read and approved of the final manuscript
Electronic supplementary material
Additional file 1: Gene lists for heatmap clusters. List of ORFs segregated as clusters from the heat map figure (Figure 1). (XLS 549 KB)
Additional file 2: 3dN2 sample dataset statistics. Tabular output of CLC Genome Workbench software for the 3dN2 sample. (XLS 822 KB)
Additional file 3: 3dNH4 sample dataset statistics. Tabular output of CLC Genome Workbench software for the 3dNH4 sample. (XLS 822 KB)
Additional file 4: 5dNH4 sample dataset statistics. Tabular output of CLC Genome Workbench software for the 5dNH4 sample. (XLS 822 KB)
Additional file 5: Pairwise comparison of three day samples. Comparison of RPKM values from the 3dNH4 and 3dN2 samples for annotated Frankia sp. strain CcI3 ORFs. (XLS 2 MB)
Additional file 6: Pairwise comparison of 3dN2 with 5dNH4. Comparison of RPKM values from the 5dNH4 and 3dN2 samples for annotated Frankia sp. strain CcI3 ORFs. (XLS 2 MB)
Additional file 7: Pairwise comparison of the two NH4 grown cells. Comparison of RPKM values from the 3dNH4 and 5dNH4 samples for annotated Frankia sp. strain CcI3 ORFs. (XLS 2 MB)
Additional file 8: SNP calling and filtering datasets. Excel worksheets containing raw SNP calling data from all three RNA-seq experiments. (XLS 844 KB)
About this article
Cite this article
Bickhart, D.M., Benson, D.R. Transcriptomes of Frankia sp. strain CcI3 in growth transitions. BMC Microbiol 11, 192 (2011). https://doi.org/10.1186/1471-2180-11-192