The transcriptome landscape of Prochlorococcus MED4 and the factors for stabilizing the core genome
© Wang et al.; licensee BioMed Central Ltd. 2014
Received: 3 September 2013
Accepted: 14 January 2014
Published: 18 January 2014
Gene gain and loss frequently occurs in the cyanobacterium Prochlorococcus, a phototroph that numerically dominates tropical and subtropical open oceans. However, little is known about the stabilization of its core genome, which contains approximately 1250 genes, in the context of genome streamlining. Using Prochlorococcus MED4 as a model organism, we investigated the constraints on core genome stabilization using transcriptome profiling.
RNA-Seq technique was used to obtain the transcriptome map of Prochlorococcus MED4, including operons, untranslated regions, non-coding RNAs, and novel genes. Genome-wide expression profiles revealed that three factors contribute to core genome stabilization. First, a negative correlation between gene expression levels and protein evolutionary rates was observed. Highly expressed genes were overrepresented in the core genome but not in the flexible genome. Gene necessity was determined as a second powerful constraint on genome evolution through functional enrichment analysis. Third, quick mRNA turnover may increase corresponding proteins’ fidelity among genes that were abundantly expressed. Together, these factors influence core genome stabilization during MED4 genome evolution.
Gene expression, gene necessity, and mRNA turnover contribute to core genome maintenance during cyanobacterium Prochlorococcus genus evolution.
KeywordsCore genome Gene expression Molecular evolution Prochlorococcus RNA-Seq Transcriptome
The marine free-living cyanobacterium Prochlorococcus is the most abundant autotroph on our planet, yet its cell size and genome are nearly the smallest among the oxygenic phototrophs [1, 2]. This bacterium geographically distributes throughout tropical and subtropical open seas, thriving particularly in oligotrophic regions [2, 3]. The Prochlorococcus genus mainly consists of high-light (HL) and low-light (LL) ecotypes. These ecotypes display different vertical niche partitioning in water columns with stratified light and nutrient distributions .
Genome streamlining is an intriguing phenomenon that has long been observed in Prochlorococcus lineages . Kettler et al. defined approximately 1250 genes as the core genome of Prochlorococcus based on a systemic analysis of 12 genome sequences of this clade, whereas more than 5000 genes were estimated within the flexible genome . Although Prochlorococcus ecotype differentiation associated with flexible genome streamlining has been extensively studied [7–10], the mechanism in which the Prochlorococcus core genome is consistently maintained is unknown. It is hypothesized that core genes are more essential to a lineage than flexible genes [11, 12], and thus, functional necessity dictates core genome stabilization. However, a growing body of evidences suggests that gene expression level is another important and independent predictor of molecular evolution from prokaryote to eukaryote [13–17]. Therefore, it is possible that Prochlorococcus genome stabilization and streamlining is not only influenced by functional gene necessity, and further transcriptome analyses are required to explain the genome evolution within this genus. Interestingly, the subspecies Prochlorococcus MED4 has an increased rate of protein evolution and a remarkably reduced genome [7, 9, 18]. These characteristics make it an ideal model organism for examining the evolutionary factors that influence genome evolution.
Summary of sequenced ten samples
Total pair reads
Total mapped rate
Perfect mapped rate
Gene expression rate
All CDS genes
Transcriptome structure of Prochlorococcus MED4
The Illumina high-throughput sequencing (RNA-Seq) protocols were applied to ten Prochlorococcus MED4 samples cultured in Pro99 and AMP (Table 1; Methods). Altogether, 62.8 million 90-bp pair-end reads were generated, and approximately 51.0 million pair-end reads (81.3%) were perfectly mapped to the genome (Table 1). Collectively, 91.8% of the MED4 genome was transcribed for at least one growth condition, and 61.2% of the genome was transcribed in all conditions. The transcribed regions might be larger if more growth conditions are tested. The genome expression cut-off was defined as the coverage of the tenth percentile of the lowest expressed genome regions  (Table 1). In contrast, 96.6% of 1965 coding-sequence (CDS) genes were expressed in at least one growth condition, and 80.9% were expressed in all conditions. Gene expression cut-off was defined as the mean RPKM (reads per kilobase per million mapped reads ) of the ten percentages of the lowest expressed gene regions (Table 1).
UTRs were predicted by identifying the operons’ boundaries. These were defined as sharp declines in coverage of the regions upstream or downstream of the start or stop codons, respectively (Methods). Accordingly, 745 5’UTRs were identified and the median UTR length was approximately 29 nucleotides (nt) (Sheet 1 of Additional file 2). Although most 5’UTRs were small and typically similar to many other bacterial [24, 34], 8.86% of the 5’UTRs identified were longer than 100 nt. Long 5’UTR, particularly in prokaryotes, may contain cis-regulation element(s) such as the Shine-Dalgarno (SD) sequence, which mediates mRNA translational efficiency. Potential RNA elements (5’UTR > 15 nt) were scanned using the Rfam , but no conserved elements were identified. These observations are in agreement with previous work  and suggest Prochlorococcus may contain unknown cis-regulatory sequences, like targets for ncRNAs.
We also identified 337 3’UTRs (Sheet 2 of Additional file 2). When these sequences (3’UTR > 10 nt) were searched by the ARNold , only 11 significant termination signals were identified (Sheet 2 of Additional file 2). However, the high proportion (35.6%) of long 3’UTRs (> 60 nt) suggests that these regions may have other important roles that require further exploration.
To identify new ORFs and ncRNAs, we analyzed the intergenic regions determined by current gene annotation (Sheet 2 of Additional file 3). Seven transcript units were identified with high confidence, including two ORFs and five ncRNAs (Additional file 4). The two ORFs were conserved hypothetical proteins present in related subspecies such as P. marinus MIT9202, P. marinus W9, and P. marinus MIT9515. All five identified ncRNAs were expressed in at least eight conditions (Additional file 4). In particular, TibYfr5 was the highest expressed ncRNA among five predicted ncRNAs, whereas TibYfr1 consistently showed the highest abundance under the light–dark conditions . This suggests that TibYfr1 and TibYfr5 expression level may be influenced by changes in light.
Highly expressed genes were overrepresented in the core genome but not in the flexible genome
To uncover the variations of gene expression and molecular conservation, all CDS genes were classified into five subclasses based on expression level. Briefly, first, we assumed that at a certain time point, some transcripts are highly expressed, and some are lowly expressed or not even transcribed. Then, excluding the non-expressed genes, we used quartation to classify all expressed genes to three expression level groups: the genes with the top 25% RPKM in a sample were defined as highly expressed genes (HEG), the lowest 25% were classified to lowly expressed genes (LEG), and the median group was defined as moderately expressed genes (MEG). Thus, if we trace one gene’s expression level across multiple samples, it might be constantly classified into HEG, MEG, LEG, or NEG (non expressed genes), which were collectively designated constantly expressed genes (CEG); otherwise, it was defined as variably expressed gene (VEG).
Next, we compared the five gene expression subclasses of the core genome to that of the flexible genome. Our analysis clearly indicates that the genes in the HEG and MEG subclasses were more enriched in the core genome than in the flexible genome (17.7% > 11.5% and 26.8% > 15.3%, respectively; P < 0.001; Figure 3c). Conversely, the core genome had fewer NEG and VEG than the flexible genome (1.5% < 6.6% and 49.6% < 64.6%, respectively; P < 0.001; Figure 3c). These data strongly suggest that Prochlorococcus MED4 genes with constant high expression levels evolve slowly, and this concurs with previous findings in other prokaryotes and eukaryotes [13, 15, 17]. They also suggest that genes with relatively stable expression are more likely to evolve slowly when compared with VEG.
Gene expression level and functional necessity independently influence core genome stabilization
We also compared the expression levels of the core MED4 genes that had homologs in the DEG database (DEG-hit) with those genes that did not have any known homologs (DEG-miss). HEG, LEG, and NEG had no enrichment for either DEG-hit or DEG-miss genes (P > 0.1; Figure 4b). Although the MEG subclass had a significantly higher rate of DEG-hit genes (P < 0.001; Figure 4b), the mean expression level of the DEG-hit genes (mean RPKM = 602.62) was not significantly different from that of the DEG-miss genes (mean RPKM = 874.81; Student's t-test, two-tailed P = 0.084). Therefore, as previous works reported [14, 40, 41], this suggests that essential genes are not necessarily highly expressed and that gene expression levels relatively independently affect sequence evolution in Prochlorococcus MED4.
We also performed functional enrichment analysis on each gene expression subclass. As most of the genes in the flexible genome have no COG categories , we mainly focused on the core genes’ expression subclasses, especially the HEG. Among these core HEG genes, several functional categories were more prominent than others. These included the “C” (energy production and conversion), “J” (translation and ribosomal structure), and “O” (protein modification, folding and turnover) categories (Figure 4c). These results suggest that these central metabolic functions are among the most conserved throughout the evolution of Prochlorococcus lineage. In particular, translational and ribosomal components are generally regarded as the most stable part of genome [14, 43]. In addition to ribosomal proteins, photosynthetic apparatus and energy metabolism genes were also overrepresented among the core genome. Interestingly, genes involved in protein modification and folding were also stably and highly expressed, suggesting that these genes are under strict constraints similar to those observed for ribosomal and photosynthetic genes.
Additionally, category “R” (general function) was slightly enriched in both LEG and NEG (P = 0.023 and 0.055; data not shown).
Varied gene expression in different cellular processes
Intriguingly, hli genes exhibited high expression levels (Figure 5a). This may be due to the sustained light condition used in this study. However, HEG were not enriched among the hli genes (Figure 5b). We infer that is because several genes, such as PMM1384 and 1385 (hli12 and hli11), are highly expressed when cells are exposed to high light conditions (Sheet 3 of Additional file 3). On the contrary, PMM1390 (hli10) was slightly transcribed (Sheet 3 of Additional file 3). It may that differentially expressed hli genes protect different cellular components, such as light harvesting antenna and nucleic acids [45, 49].
As expected, phage-related genes displayed the lowest expression levels in this study, as phage infection conditions were not tested. It would be better to have phage infection condition data to analysis these genes expression profiles. For phosphorus and nitrogen acquisition genes, there was no significant enrichment in the four expression subclasses (Figure 5b). However, PMM1119 and PMM112 (two P-limitation-inducible porins) , and one ammonium transporter (amt1, PMM0263) were highly expressed (Sheet 3 of Additional file 3), suggesting that these proteins play particular roles in phosphorus or nitrogen uptake, respectively.
Conserved genes more likely clustered to operon than poorly conserved genes
Constantly and abundantly expressed transcripts undergo quick degradation
Prochlorococcus is a typical phototroph whose cellular physiology and transcriptome are comprehensively affected by photoperiod [38, 46]. We wondered whether light cycle-influenced gene expression profiles might lead to contradictory conclusions regarding the correlation between gene expression and evolution traits when Prochlorococcus is cultured under constant light conditions. Therefore, we applied the same method we developed to light–dark expression data generated by RNA-Seq . First, we again observed a significant correlation between gene expression levels and corresponding nonsynonymous substitution rates (N = 1275, Spearman’s r = -0.69, P < 0.001; Additional file 5), and this was confirmed by the comparisons of evolution rates of four expression subclasses (P < 0.001; Additional file 6a). Second, constantly expressed genes, particularly HEG and MEG with lower Ka, were most often located within the core genome (Additional file 6c). Third, lowly expressed genes were more likely slowly degraded (Additional file 7a), and four of seven exceptions described above (Figure 7a) retained in this light–dark conditions (Additional file 7a). The comparisons of gene expression subclasses further indicated constantly and highly expressed transcripts tend to be quickly degraded (Additional file 7b). Interestingly, there was no significant difference between HEG and MEG (P > 0.1, Additional file 7b), and the same trait was also observed in the correlation between gene expression levels and half-lives when expression level increased to a certain degree the decay rate no longer declined (Figure 7a and Additional file 7a). These observations might be partially caused by specific growth conditions, or alternatively, by the genes’ position in operon because those genes located at 3’-end of operons are less expressed but slower degraded than 5’-end genes . Therefore, half-lives of the high-operon-rate genes, such as HEG and MEG (Figure 6b), are more likely dependent upon their positions in operons. Despite opronic genes’ position, degradation distinction still can be observed in those genes with great difference in expression levels (like HEG versus LEG). However, it is not simplistic to figure out what extent the gene position can influence half-life to, and this also deviates from our topic in this study.
Although all experimental conditions tested in this study are considered physiologically normal, we also wonder whether environmental stress, such as iron that was studied by Thompson and coworkers , may affect the correlation between gene expression levels and molecular evolution. First, similar results were observed that highly and constantly expressed genes had lower Ka (Additional file 8a and b), and they were enriched more within the core genome (Additional file 8c). Second, those genes with constantly high expression level (HEG and MEG) had short half-lives (Additional file 9). Nonetheless, all of our observations are in accordance with previous conclusions drawn from normal growth conditions under constant illumination, and this may indicate that gene expression levels have relatively self-contained influence on genome evolution in Prochlorococcus MED4. But note that the conditions we have tested are actually in the laboratory, the similar study conducted using the cultures in situ will facilitate to further elucidate the core genome stabilization of Prochlorococcus.
Genes within the flexible genome are subject to relaxed constraints, and these genes can undergo frequent gain and loss in Prochlorococcus, leading to isolates differentiation. Multiple factors such as an incomplete DNA-repair system, relaxed selection, and paralog deletion can contribute to genome reduction [7–10]. However, little is known about factors that affect the molecular evolution of the Prochlorococcus core genome. Gene expression level has been reported as an independent factor that influences the rate of protein evolution across taxa [13, 14, 17, 54]. In this study, we have provided evidences that highly conserved genes were more likely to be abundantly expressed, and highly and constantly expressed genes were distributed more in the core genome than in the flexible genome (Figures 2 and 3). Selection pressure imposes on those highly expressed genes to minimize the great cost (or toxicity) of corresponding mistranslated or error-folded proteins [17, 55]. As the core genes show higher expression levels, these genes accordingly undergo more powerful evolutionary constraints derived from translation and folding . Because efficient and fast mRNA degradation can minimize the use of poor mRNA and thus reduce the production of low-quality polypeptides derived from translation errors , highly expressed genes are more likely to be quickly degraded. This in turn increases the cellular fitness of abundantly expressed core genes. Notably, genes involved in protein folding and turnover were stably and highly expressed (Figure 4c). This has also been observed in natural microbial communities revealed by metatranscriptomic data . These findings suggest that Prochlorococcus invests in protein folding and degradation to ensure protein fidelity, and thus further increases translational robustness.
However, it is reasonable to assume that essential genes are more likely abundantly expressed, thus the core genome that is of high necessity has higher expression level. Previous reports have demonstrated the difficulties in accepting this assumption [14, 40]. Our result also suggests that expression level is relatively independent of gene necessity in Prochlorococcus MED4, as no significant difference in gene expression levels was observed between genes with conserved essential homologs (DEG-hit) and those without homologs (DEG-miss) (Figure 4b). In terms of which one contributing more than the other, the better model is required in the future.
The gene necessity (or indispensability)  influences the core genome stabilization because of its essential functions for physiology and metabolism. In particular, we found that energy metabolism, protein synthesis, and protein folding genes were more enriched in HEG within the core genome (Figure 4c). This implies that these central metabolic pathways lie in the most conserved gene pool across the evolutionary history of Prochlorococcus. Therefore, by analyzing mRNA levels, we were able to reach the same conclusion as those drawn by comparative genomics and protein sequence alignments . Additionally, operons were more likely distributed in the core genome than in the flexible genome (Figure 6b). Basically, this distinction might be derived from the higher proportion of essential genes in the core genome (Figure 4a). Important genes more likely cluster to operons because those central metabolic genes, such as photosynthetic apparatus or ribosome machinery, in the same operon can be beneficially co-regulated and co-transcribed, and (or) packed to a complex [50, 51, 58].
We used RNA-Seq to obtain a blueprint of the transcriptome of Prochlorococcus MED4. We identified remarkable distinctions in gene expression levels, gene necessity, and mRNA turnover between the core and flexible genomes, indicating that they are powerful constraints imposed on core genome stabilization. We hope these findings will contribute to a better understanding of the causes of ecotypic differentiation in the Prochlorococcus genus, and offer a new perspective for future investigations of cyanobacterium evolution.
Growth of Prochlorococcus MED4
Prochlorococcus MED4 strains were cultured in Pro99 medium and AMP  at 21°C with an irradiance of 28 μmol quanta m-2 s-1. Before the experiment, the cultures were maintained under continuous light at the stationary phase for five generations. Then 8 ml of stationary-phase cell cultures were inoculated into 92 ml of indicated growth medium (Table 1). For the Pro99, cells were harvested throughout the life cycle. These included lag-phase (esl1d), early log-phase (esl3d), middle log-phase (esl4d), stationary phase (esl8d), and post-stationary phase (esl10d) (Additional file 10). For AMP, stationary-phase cells were grown with varying concentrations of sodium bicarbonate (0 mM, 6 mM, and 24 mM)  for two time periods (5 hours, 10 hours; Table 1) (our primary aim was to maximize the number of transcripts represented under normal growth conditions). Each growth condition was performed in triplicate. Chlorophyll fluorescence was monitored on a Plate reader (Spectra Max M2e, Molecular Devices), with an excitation wavelength of 440 nm and an emission wavelength of 680 nm.
Total mRNA preparation
To extract total mRNA, one volume of each culture was fixed with three volumes of RNA-later (15 mM EDTA, 18.75 mM sodium citrate, and 525 g/l ammonium sulfate), harvested by filtration (0.22 μm cellulose membrane), snap frozen in liquid nitrogen, and stored at -80°C. Before RNA extraction, cells were treated with 150 ml 10 mM Tris–HCl (pH 7.5), 2 ml RNase inhibitor (20 U/μl, AM2696), and 1 ml Readylysis lysozyme (Epicentre). Total RNA was extracted using the mirVana RNA isolation kit according to the manufacturer’s instructions (Ambion). DNA was removed by using Turbo DNA-free™ Kit (Ambion). Quality of the total RNA samples was assessed using the Nanodrop spectrophotometer (Thermo) and agarose gel electrophoresis. The total RNA of each triplicate culture was extracted separately, and mixed together (~8 μg) after measuring the quality of each sample.
cDNA synthesis, DNA sequencing and reads mapping
cDNA synthesis was performed using the standard protocol of Shenzhen BGI (China) . Briefly, the rRNA-depleted mRNA (for details see BGI patent WO2012083832 A1) was fragmented and then random primers were used to synthesize the first-strand cDNA. The second-strand cDNA was synthesized with DNA polymerase I. Short fragments were purified with QiaQuick PCR extraction kit (Qiagen), and then were sequenced under the Illumina HiSeq™ 2000 platform at Shenzhen BGI. The full sequencing technical details can be inspected in the services of BGI (http://www.genomics.cn). This yielded approximately six million 90-bp pair-end reads for each sample (Table 1). Then pair-end reads were mapped to the Prochlorococcus MED4 genome (accession number: NC_005072) using Bowtie2  with at most one mismatch. The coverage of each nucleotide was calculated by counting the number of reads mapped at corresponding nucleotide positions in the genome. The number of reads that were perfectly mapped to a gene region was calculated using BEDTools , and then it was normalized by gene length and total mapped reads, namely RPKM as the gene expression value . The gene annotations for Prochlorococcus MED4 were downloaded from MicrobesOnline  with modifications for non-annotated genes that were designated “HyPMM#”. New ORFs identified in this study were annotated with “TibPMM#” (Sheet 2 of Additional file 3). Sequences generated by this study are available in the Gene Expression Omnibus (GEO) under accession number GSE49517.
Identification of operons and UTRs
Using a priori knowledge of the translation start and stop site from Additional file 3, the coverage of ORF upstream and downstream regions was scanned to identify a point of sharp coverage decline. To define the boundary, we applied criteria modified from Vijayan et al.. Briefly, a transcript’s boundary (translation start or stop site was defined as i = 0, and “i + 1” is the upstream or downstream of position “i”) was defined when position “i” satisfied one of the following three criteria: (1) coverage(i)/coverage(i + 1) ≥ 2, binomialcdf (coverage(i + 1), coverage(i) + coverage(i + 1), 0.5) < 0.01 and coverage(i + 1) > coverage(i:(i-89))/(90 × 7); (2) coverage(i)/coverage(i + 1) ≥ 5 or coverage(i)/coverage(i + 2) ≥ 5, and coverage(i + 1) < coverage(i:(i-89))/(90 × 7); (3) coverage(i + 1) ≤ background. Where binomialcdf (x, n, p) is the probability of observing up to x successes in n independent trials when success probability for each trial is p. We assumed reads were uniformly distributed on position “i” and “i + 1” (p = 0.5). If a sharp coverage reduction occurred, coverage(i + 1) would be much smaller than coverage(i); that was, the success of coverage(i + 1) became a small probability event in the events of all reads mapped to “i” and “i + 1” (binomialcdf < 0.01). The strictest criterion (1) was used for highly transcribed genes. Since the coverage of a certain transcript is uneven from the 5’-end to the 3’-end for sequencing bias , we checked the coverage of each gene’s left and right 90-bp nucleotides to define whether the gene’s upstream or downstream regions were transcribed at high or low levels. For position “i”, if its coverage was higher than 1/7th of the mean coverage of the upstream or downstream 90-bp (Sheet 1 of Additional file 3), this position would be examined by criterion (1) for the boundary definition. Otherwise, it fell under criterion (2). If the reduction of coverage was not sufficient for the above two criteria, the boundary would be defined by genome background (Sheet 1 of Additional file 3), which was determined as the tenth percentile of the lowest expressed nucleotides within gene regions .
The 5’UTR was defined as the upstream sequence from the translation start site of transcript, and 3’UTR was the downstream sequence from the translation stop site. If the adjacency of two ORFs located on the same strand had no sharp coverage reduction that was filtered by the three criteria described above, two ORFs belonged to a single operon. To obtain a robust operon map, operons that were repeatedly observed in at least three samples were considered reliable. The operon map was manually proofread to account for unpredictable fluctuations in computing.
Novel gene identification
The intergenic regions were scanned to identify new genes. A rapid coverage reduction was considered the end of the new transcript, and this was confirmed by manual assessment. Putative transcripts were analyzed using BLASTn (E-value = 1 × 10-3, word = 4) and BLASTp (E-value = 1 × 10-4, word = 3) to confirm homologs of these putative proteins. Next, candidate ORFs were predicted by GeneMark  using Prochlorococcus MED4 as the training model. The remaining transcripts that were filtered by BLAST were defined as putative ncRNAs.
Enrichment analysis involves the statistically identification of a particular function category or expression subclass that is overrepresented in the whole gene collection. Since many cases in our study contained a small number of genes, we used Fisher’s exact test (one-tailed) for enrichment analysis (Fisher’s exact test were applied for all statistic significance tests in this study unless otherwise indicated). Some genes without COG were not excluded so the enrichment was fully representative. COG functional groups can be inspected in COGs database .
Estimating synonymous (Ks) and nonsynonymous (Ka) substitution rate
The complete genome sequences of Prochlorococcus SS120, Prochlorococcus MIT9313, and Synechococcus CC9311 (accession number: NC_005042, NC_005071, and NC_008319) were downloaded from NCBI. Annotations were obtained from Kettler et al.. Pairwise calculations of Ka and Ks of Prochlorococcus MED4 orthologs compared with each of the three related species were performed using software YN00 in the package PAML . To analyze the correlation between Ka and gene expression levels, mean Ka values of the three ortholog pairs were used.
Artificial medium for Prochlorococcus
Coding DNA sequence
Constantly expressed genes
Clusters of orthologous groups 
Database of essential genes 
Highly expressed genes
Nonsynonymous substitution rate
Synonymous substitution rate
Lowly expressed genes
Moderately expressed genes
National Center for Biotechnology Information
Non expressed genes
Open reading frame
Reads per kilobase per million mapped reads
Variably expressed genes
Cyanobacterial functional RNA.
We are grateful to Professor Sallie W. (Penny) Chisholm of MIT for offering CT a short visit to her laboratory and for kind suggestions on Prochlorococcus work. We are also grateful to Allison Coe for help provided during CT’s short visit to Chisholm’s lab. We also thank Yuan Li, Pingping Wang, and Pengpeng Li for technical discussions. This work was supported by the 973 Program of China (2011CBA00800 and 2013CB733600), Project of Chinese Academy of Sciences (KSCX2-EW-G-8) and 863 Program of China (2012AA022203D).
- Chisholm SW, Olson RJ, Zettler ER, Goericke R, Waterbury JB, Welschmeyer NA: A novel free-living prochlorophyte abundant in the oceanic euphotic zone. Nature. 1988, 334: 340-343. 10.1038/334340a0.View ArticleGoogle Scholar
- Partensky F, Hess WR, Vaulot D: Prochlorococcus, a marine photosynthetic prokaryote of global significance. Microbiol Mol Biol Rev. 1999, 63: 106-127.PubMed CentralPubMedGoogle Scholar
- Partensky F, Garczarek L: Prochlorococcus: advantages and limits of minimalism. Ann Rev Mar Sci. 2010, 2: 305-331. 10.1146/annurev-marine-120308-081034.View ArticlePubMedGoogle Scholar
- Moore LR, Rocap G, Chisholm SW: Physiology and molecular phylogeny of coexisting Prochlorococcus ecotypes. Nature. 1998, 393: 464-467. 10.1038/30965.View ArticlePubMedGoogle Scholar
- García-Fernández JM, de Marsac NT, Diez J: Streamlined regulation and gene loss as adaptive mechanisms in Prochlorococcus for optimized nitrogen utilization in oligotrophic environments. Microbiol Mol Biol Rev. 2004, 68: 630-638. 10.1128/MMBR.68.4.630-638.2004.PubMed CentralView ArticlePubMedGoogle Scholar
- Kettler GC, Martiny AC, Huang K, Zucker J, Coleman ML, Rodrigue S, Chen F, Lapidus A, Ferriera S, Johnson J, et al.: Patterns and implications of gene gain and loss in the evolution of Prochlorococcus. PLoS Genet. 2007, 3: e231-10.1371/journal.pgen.0030231.PubMed CentralView ArticlePubMedGoogle Scholar
- Dufresne A, Garczarek L, Partensky F: Accelerated evolution associated with genome reduction in a free-living prokaryote. Genome Biol. 2005, 6: 1-10.View ArticleGoogle Scholar
- Marais GB, Calteau A, Tenaillon O: Mutation rate and genome reduction in endosymbiotic and free-living bacteria. Genetica. 2008, 134: 205-210. 10.1007/s10709-007-9226-6.View ArticlePubMedGoogle Scholar
- Hu J, Blanchard JL: Environmental sequence data from the sargasso Sea reveal that the characteristics of genome reduction in Prochlorococcus Are Not a harbinger for an escalation in genetic drift. Mol Biol Evol. 2009, 26: 5-13.View ArticlePubMedGoogle Scholar
- Luo H, Friedman R, Tang J, Hughes AL: Genome reduction by deletion of paralogs in the marine cyanobacterium Prochlorococcus. Mol Biol Evol. 2011, 28: 2751-2760. 10.1093/molbev/msr081.PubMed CentralView ArticlePubMedGoogle Scholar
- Grote J, Thrash JC, Huggett MJ, Landry ZC, Carini P, Giovannoni SJ, Rappé MS: Streamlining and core genome conservation among highly divergent members of the SAR11 clade. mBio. 2012, 3 (5): e00252–12-PubMed CentralView ArticlePubMedGoogle Scholar
- Liu W, Fang L, Li M, Li S, Guo S, Luo R, Feng Z, Li B, Zhou Z, Shao G, et al.: Comparative genomics of mycoplasma: analysis of conserved essential genes and diversity of the Pan-genome. PLoS One. 2012, 7 (4): e35698-10.1371/journal.pone.0035698.PubMed CentralView ArticlePubMedGoogle Scholar
- Pál C, Papp B, Hurst LD: Highly expressed genes in yeast evolve slowly. Genetics. 2001, 158: 927-931.PubMed CentralPubMedGoogle Scholar
- Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH: Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA. 2005, 102: 14338-14343. 10.1073/pnas.0504070102.PubMed CentralView ArticlePubMedGoogle Scholar
- Brawand D, Soumillon M, Necsulea A, Julien P, Csardi G, Harrigan P, Weier M, Liechti A, Aximu-Petri A, Kircher M, et al.: The evolution of gene expression levels in mammalian organs. Nature. 2011, 478: 343-348. 10.1038/nature10532.View ArticlePubMedGoogle Scholar
- Whitehead A, Crawford DL: Neutral and adaptive variation in gene expression. Proc Natl Acad Sci USA. 2006, 103: 5425-5430. 10.1073/pnas.0507648103.PubMed CentralView ArticlePubMedGoogle Scholar
- Drummond DA, Wilke CO: Mistranslation-induced protein misfolding as a dominant constraint on coding-sequence evolution. Cell. 2008, 134: 341-352. 10.1016/j.cell.2008.05.042.PubMed CentralView ArticlePubMedGoogle Scholar
- Rocap G, Larimer FW, Lamerdin J, Malfatti S, Chain P, Ahlgren NA, Arellano A, Coleman M, Hauser L, Hess WR, et al.: Genome divergence in two Prochlorococcus ecotypes reflects oceanic niche differentiation. Nature. 2003, 424: 1042-1047. 10.1038/nature01947.View ArticlePubMedGoogle Scholar
- Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y: RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 2008, 18: 1509-1517. 10.1101/gr.079558.108.PubMed CentralView ArticlePubMedGoogle Scholar
- Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10: 57-63. 10.1038/nrg2484.PubMed CentralView ArticlePubMedGoogle Scholar
- Cho B-K, Zengler K, Qiu Y, Park YS, Knight EM, Barrett CL, Gao Y, Palsson BO: The transcription unit architecture of the Escherichia coli genome. Nat Biotech. 2009, 27: 1043-1049. 10.1038/nbt.1582.View ArticleGoogle Scholar
- Passalacqua KD, Varadarajan A, Ondov BD, Okou DT, Zwick ME, Bergman NH: Structure and complexity of a bacterial transcriptome. J Bacteriol. 2009, 191: 3203-3211. 10.1128/JB.00122-09.PubMed CentralView ArticlePubMedGoogle Scholar
- Wurtzel O, Sapra R, Chen F, Zhu Y, Simmons BA, Sorek R: A single-base resolution map of an archaeal transcriptome. Genome Res. 2010, 20: 133-141. 10.1101/gr.100396.109.PubMed CentralView ArticlePubMedGoogle Scholar
- Vijayan V, Jain IH, O'Shea EK: A high resolution map of a cyanobacterial transcriptome. Genome Biol. 2011, 12 (5): R47-10.1186/gb-2011-12-5-r47.PubMed CentralView ArticlePubMedGoogle Scholar
- Moore LR, Coe A, Zinser ER, Saito MA, Sullivan MB, Lindell D, Frois-Moniz K, Waterbury J, Chisholm SW: Culturing the marine cyanobacterium Prochlorococcus. Limnol Oceanogr Meth. 2007, 5: 353-362.View ArticleGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5: 621-628. 10.1038/nmeth.1226.View ArticlePubMedGoogle Scholar
- Taboada B, Ciria R, Martinez-Guerrero CE, Merino E: ProOpDB: prokaryotic operon DataBase. Nucleic Acids Res. 2012, 40: D627-D631. 10.1093/nar/gkr1020.PubMed CentralView ArticlePubMedGoogle Scholar
- Steglich C, Futschik ME, Lindell D, Voss B, Chisholm SW, Hess WR: The challenge of regulation in a minimal photoautotroph: Non-coding RNAs in Prochlorococcus. PLoS Genet. 2008, 4 (8): e1000173-10.1371/journal.pgen.1000173.PubMed CentralView ArticlePubMedGoogle Scholar
- Steglich C, Lindell D, Futschik M, Rector T, Steen R, Chisholm SW: Short RNA half-lives in the slow-growing marine cyanobacterium Prochlorococcus. Genome Biol. 2010, 11: R54-10.1186/gb-2010-11-5-r54.PubMed CentralView ArticlePubMedGoogle Scholar
- Holtzendorff J, Partensky F, Mella D, Lennon J-F, Hess WR, Garczarek L: Genome streamlining results in loss of robustness of the circadian clock in the marine cyanobacterium Prochlorococcus marinus PCC 9511. J Biol Rhythms. 2008, 23: 187-199. 10.1177/0748730408316040.View ArticlePubMedGoogle Scholar
- Mary I, Vaulot D: Two-component systems in Prochlorococcus MED4: Genomic analysis and differential expression under stress. FEMS Microbiol Lett. 2003, 226: 135-144. 10.1016/S0378-1097(03)00587-1.View ArticlePubMedGoogle Scholar
- Memon D, Singh AK, Pakrasi HB, Wangikar PP: A global analysis of adaptive evolution of operons in cyanobacteria. Antonie Van Leeuwenhoek. 2013, 103 (2): 331-346. 10.1007/s10482-012-9813-0.View ArticlePubMedGoogle Scholar
- Klein MG, Zwart P, Bagby SC, Cai F, Chisholm SW, Heinhorst S, Cannon GC, Kerfeld CA: Identification and structural analysis of a novel carboxysome shell protein with implications for metabolite transport. J Mol Biol. 2009, 392: 319-333. 10.1016/j.jmb.2009.03.056.View ArticlePubMedGoogle Scholar
- Sorek R, Cossart P: Prokaryotic transcriptomics: a new view on regulation, physiology and pathogenicity. Nat Rev Genet. 2010, 11: 9-16.View ArticlePubMedGoogle Scholar
- Gardner PP, Daub J, Tate JG, Nawrocki EP, Kolbe DL, Lindgreen S, Wilkinson AC, Finn RD, Griffiths-Jones S, Eddy SR, Bateman A: Rfam: updates to the RNA families database. Nucleic Acids Res. 2009, 37: D136-D140. 10.1093/nar/gkn766.PubMed CentralView ArticlePubMedGoogle Scholar
- Tagwerker C, Dupont CL, Karas BJ, Ma L, Chuang RY, Benders GA, Ramon A, Novotny M, Montague MG, Venepally P, et al.: Sequence analysis of a complete 1.66 Mb Prochlorococcus marinus MED4 genome cloned in yeast. Nucleic Acids Res. 2012, 40 (20): 10375-10383. 10.1093/nar/gks823.PubMed CentralView ArticlePubMedGoogle Scholar
- Naville M, Ghuillot-Gaudeffroy A, Marchais A, Gautheret D: ARNold: a web tool for the prediction of Rho-independent transcription terminators. RNA Biol. 2011, 8: 11-13. 10.4161/rna.8.1.13346.View ArticlePubMedGoogle Scholar
- Waldbauer JR, Rodrigue S, Coleman ML, Chisholm SW: Transcriptome and proteome dynamics of a light–dark synchronized bacterial cell cycle. PLoS One. 2012, 7: e43432-10.1371/journal.pone.0043432.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang R, Lin Y: DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res. 2009, 37: D455-D458. 10.1093/nar/gkn858.PubMed CentralView ArticlePubMedGoogle Scholar
- Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman MW: Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci USA. 2005, 102: 5483-5488. 10.1073/pnas.0501761102.PubMed CentralView ArticlePubMedGoogle Scholar
- Drummond DA, Raval A, Wilke CO: A single determinant dominates the rate of yeast protein evolution. Mol Biol Evol. 2006, 23: 327-337.View ArticlePubMedGoogle Scholar
- Tatusov RL, Galperin MY, Natale DA, Koonin EV: The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000, 28: 33-36. 10.1093/nar/28.1.33.PubMed CentralView ArticlePubMedGoogle Scholar
- Shi T, Falkowski PG: Genome evolution in cyanobacteria: the stable core and the variable shell. Proc Natl Acad Sci USA. 2008, 105: 2510-2515. 10.1073/pnas.0711165105.PubMed CentralView ArticlePubMedGoogle Scholar
- Banerjee T, Ghosh TC: Gene expression level shapes the amino acid usages in Prochlorococcus marinus MED4. J Biomol Struct Dyn. 2006, 23: 547-553. 10.1080/07391102.2006.10507079.View ArticlePubMedGoogle Scholar
- Mulkidjanian AY, Koonin EV, Makarova KS, Mekhedov SL, Sorokin A, Wolf YI, Dufresne A, Partensky F, Burd H, Kaznadzey D, et al.: The cyanobacterial genome core and the origin of photosynthesis. Proc Natl Acad Sci USA. 2006, 103: 13126-13131. 10.1073/pnas.0605709103.PubMed CentralView ArticlePubMedGoogle Scholar
- Zinser ER, Lindell D, Johnson ZI, Futschik ME, Steglich C, Coleman ML, Wright MA, Rector T, Steen R, McNulty N, et al.: Choreography of the transcriptome, photophysiology, and cell cycle of a minimal photoautotroph, prochlorococcus. PLoS One. 2009, 4: e5135-10.1371/journal.pone.0005135.PubMed CentralView ArticlePubMedGoogle Scholar
- Moore LR, Ostrowski M, Scanlan DJ, Feren K, Sweetsir T: Ecotypic variation in phosphorus-acquisition mechanisms within marine picocyanobacteria. Aquat Microb Ecol. 2005, 39: 257-269.View ArticleGoogle Scholar
- Avrani S, Wurtzel O, Sharon I, Sorek R, Lindell D: Genomic island variability facilitates Prochlorococcus-virus coexistence. Nature. 2011, 474: 604-608. 10.1038/nature10172.View ArticlePubMedGoogle Scholar
- He QF, Dolganov N, Bjorkman O, Grossman AR: The high light-inducible polypeptides in Synechocystis PCC6803 - expression and function in high light. J Biol Chem. 2001, 276: 306-314.View ArticlePubMedGoogle Scholar
- Pál C, Hurst LD: Evidence against the selfish operon theory. Trends Genet. 2004, 20: 232-234.View ArticlePubMedGoogle Scholar
- Price MN, Huang KH, Arkin AP, Alm EJ: Operon formation is driven by co-regulation and not by horizontal gene transfer. Genome Res. 2005, 15: 809-819. 10.1101/gr.3368805.PubMed CentralView ArticlePubMedGoogle Scholar
- Deana A, Belasco JG: Lost in translation: the influence of ribosomes on bacterial mRNA decay. Genes Dev. 2005, 19: 2526-2533. 10.1101/gad.1348805.View ArticlePubMedGoogle Scholar
- Thompson AW, Huang K, Saito MA, Chisholm SW: Transcriptome response of high- and low-light-adapted Prochlorococcus strains to changing iron availability. ISME J. 2011, 5: 1580-1594. 10.1038/ismej.2011.49.PubMed CentralView ArticlePubMedGoogle Scholar
- Pál C, Papp B, Lercher MJ: An integrated view of protein evolution. Nat Rev Genet. 2006, 7: 337-348. 10.1038/nrg1838.View ArticlePubMedGoogle Scholar
- Drummond DA, Wilke CO: The evolutionary consequences of erroneous protein synthesis. Nat Rev Genet. 2009, 10: 715-724. 10.1038/nrg2662.PubMed CentralView ArticlePubMedGoogle Scholar
- Stewart FJ, Sharma AK, Bryant JA, Eppley JM, DeLong EF: Community transcriptomics reveals universal patterns of protein sequence conservation in natural microbial communities. Genome Biol. 2011, 12 (3): R26-10.1186/gb-2011-12-3-r26.PubMed CentralView ArticlePubMedGoogle Scholar
- Hirsh AE, Fraser HB: Protein dispensability and rate of evolution. Nature. 2001, 411: 1046-1049. 10.1038/35082561.View ArticlePubMedGoogle Scholar
- Dandekar T, Snel B, Huynen M, Bork P: Conservation of gene order: a fingerprint of proteins that physically interact. Trends Biochem Sci. 1998, 23: 324-328. 10.1016/S0968-0004(98)01274-2.View ArticlePubMedGoogle Scholar
- Chen Z, Wen B, Wang Q, Tong W, Guo J, Bai X, Zhao J, Sun Y, Tang Q, Lin Z, et al.: Quantitative proteomics reveals the temperature-dependent proteins encoded by a series of cluster genes in Thermoanaerobacter tengcongensis. Mol Cell Proteomics. 2013, 12 (8): 2266-2277. 10.1074/mcp.M112.025817.PubMed CentralView ArticlePubMedGoogle Scholar
- Langmead B, Salzberg SL: Fast gapped-read alignment with Bowtie 2. Nat Meth. 2012, 9: 357-359. 10.1038/nmeth.1923.View ArticleGoogle Scholar
- Quinlan AR, Hall IM: BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010, 26: 841-842. 10.1093/bioinformatics/btq033.PubMed CentralView ArticlePubMedGoogle Scholar
- Dehal PS, Joachimiak MP, Price MN, Bates JT, Baumohl JK, Chivian D, Friedland GD, Huang KH, Keller K, Novichkov PS, et al.: MicrobesOnline: an integrated portal for comparative and functional genomics. Nucleic Acids Res. 2010, 38: D396-D400. 10.1093/nar/gkp919.PubMed CentralView ArticlePubMedGoogle Scholar
- Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M: The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008, 320: 1344-1349. 10.1126/science.1158441.PubMed CentralView ArticlePubMedGoogle Scholar
- Besemer J, Borodovsky M: GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses. Nucleic Acids Res. 2005, 33: W451-W454. 10.1093/nar/gki487.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Z, Nielsen R: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000, 17: 32-43. 10.1093/oxfordjournals.molbev.a026236.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.