Transcription of the extended hyp-operon in Nostoc sp. strain PCC 7120

Background The maturation of hydrogenases into active enzymes is a complex process and e.g. a correctly assembled active site requires the involvement of at least seven proteins, encoded by hypABCDEF and a hydrogenase specific protease, encoded either by hupW or hoxW. The N2-fixing cyanobacterium Nostoc sp. strain PCC 7120 may contain both an uptake and a bidirectional hydrogenase. The present study addresses the presence and expression of hyp-genes in Nostoc sp. strain PCC 7120. Results RT-PCRs demonstrated that the six hyp-genes together with one ORF may be transcribed as a single operon. Transcriptional start points (TSPs) were identified 280 bp upstream from hypF and 445 bp upstream of hypC, respectively, demonstrating the existence of several transcripts. In addition, five upstream ORFs located in between hupSL, encoding the small and large subunits of the uptake hydrogenase, and the hyp-operon, and two downstream ORFs from the hyp-genes were shown to be part of the same transcript unit. A third TSP was identified 45 bp upstream of asr0689, the first of five ORFs in this operon. The ORFs are annotated as encoding unknown proteins, with the exception of alr0692 which is identified as a NifU-like protein. Orthologues of the four ORFs asr0689-alr0692, with a highly conserved genomic arrangement positioned between hupSL, and the hyp genes are found in several other N2-fixing cyanobacteria, but are absent in non N2-fixing cyanobacteria with only the bidirectional hydrogenase. Short conserved sequences were found in six intergenic regions of the extended hyp-operon, appearing between 11 and 79 times in the genome. Conclusion This study demonstrated that five ORFs upstream of the hyp-gene cluster are co-transcribed with the hyp-genes, and identified three TSPs in the extended hyp-gene cluster in Nostoc sp. strain PCC 7120. This may indicate a function related to the assembly of a functional uptake hydrogenase, hypothetically in the assembly of the small subunit of the enzyme.


Background
The global energy demand will increase drastically in the near future due the population and economic growth; cal-culations indicate at least a 2-fold increase over 50 years [1]. To be able to meet the global energy demand new and sustainable non-coal based or carbon-neutral energy resources have to be developed. Molecular hydrogen, H 2 , is one of the upcoming promising alternative energy carriers [2]. Cyanobacteria are together with green algae possible candidates for future clean and sustainable energy production of hydrogen, since they are the only organisms capable of the unique combination of having oxygenic photosynthesis and hydrogenases, allowing them to use and convert solar energy and water into hydrogen (H 2 ) [3][4][5][6][7][8][9].
The enzymes directly involved in hydrogen metabolism in cyanobacteria are hydrogenases and nitrogenases. Depending on the cyanobacterial strain, a single cell can harbour either an uptake hydrogenase or a bidirectional enzyme or both. N 2 -fixing strains contain at least an uptake hydrogenase which recycles the energy rich hydrogen molecule is produced as a by-product of the nitrogenase under N 2 -fixation. Hydrogenases, as well as nitrogenases, are very sensitive to oxygen which inactivates the activity of the proteins. To protect these enzymes they are physically located in the special cell type called heterocyst [5,6,10], or function under anaerobic conditions only [7,11].
The cyanobacterial uptake hydrogenase consists of at least two functional subunits, encoded by the structural genes hupL and hupS. HupL harbours the active site and HupS harboururs iron-sulphur (FeS) slusters which transfer electrons from the active site [12].
All cyanobacterial hydrogenases are classified as NiFehydrogenases, being either uptake hydrogenases (HupSL) or bidirectional hydrogenases (HoxEFUYH), [5,6,8,9,12]. The active site has a complex structure with one Ni and one Fe atom, to which biochemically unusual ligands of CN and CO are bound. In order to develop a fully active and mature hydrogenase at least seven specific proteins are required. Of the seven, the six Hyp proteins encoded by hypABCDEF (hyp for hydrogenase pleiotropic) are responsible for the insertion of the metal atoms into the acitve site of the hydrogenases, as well as the attachement of the ligands to the Fe atom [5,6,12,13]. The function of the hyp-genes have been mainly studied in E. coli. The high homology to the cyanobacterial hyp-genes indicates that the role is the same in cyanobacteria. Indeed, analyses of deletion and insertion mutants of hyp genes in Synechocystis sp. PCC 6803 showed no hydrogenase activity [14]. The hyp-genes are conserved and can either be found together, e.g. in Nostoc sp. strain PCC 7120 and Anabaena variabilis ATCC 29413 or spread out in the genome as in Synechosystis sp. strain PCC 6803. There is only one set of hyp-genes independent of the number of hydrogenases in the cells [14]. This indicates a co-regulation of the hypgenes on the assembly of both types of hydrogenases [5,15]. How this is achieved is not known, but the hyp-genes should be regulated differently depending on e.g. strains, environment and type of hydrogenase. The seventh factor is encoded either by hupW or hoxW, hydrogenase specific proteases which are needed to cleave off part of the C-terminal of the large subunit [13]. This is only done after the insertion of Ni in the active site and may function as a checkpoint for the maturation process [16,17]. The cleavage enables a conformal change of the large subunit, which is necessary for the binding of the small subunit, HupS.
The small subunit of hydrogenases harbours (FeS) clusters which are the main components in electron transport to and from the active site and they define the electron transport pathways in both membrane-bound and soluble redox-enzymes [12]. How the assembly and maturation process is achieved is not well known, but three different types of (FeS) cluster assembly have been presented with two requirements in common: the need for a (FeS) scaffolding protein, and a cysteine desulphurase which is required to yield elemental sulphur or hydrogen sulphide [12,18].
Nostoc sp. strain PCC 7120 is a N 2 -fixing filamentous and heterocyst-forming cyanobacterium. The 7.21 Mb genome contains a single nitrogenase, and both an uptake and a bidirectional hydrogenases are present [19,20].
To improve the yield of H 2 produced by cyanobacteria and e.g. to establish the foundation for the introduction of foreign hydrogenases into cyanobacteria it is essential to understand the regulation of the genes directly involved in the maturation of cyanobacterial hydrogenase. In the present paper we describe the transcriptional regulation of the hyp-genes and neighbour open reading frames in Nostoc sp. strain PCC 7120. We also discuss the putative function of the upstream genes and the role for the conserved sequences present in some of the intergenic regions of the extended hyp-operon.

Transcription of the extended hyp-operon
To determine if the hyp-genes are transcribed as a single operon and to identify putative 5'RACE transcriptional start points (TSPs), reverse transcriptase PCR (RT-PCR), Northern blot, and experiments were performed using total RNA isolated from N 2 -fixing cultures. The six hypgenes, hypFCDEAB, the ORF asr0697 located between hypD and hypE, the two downstream ORFs and five upstream ORFs are all shown to be part of the same operon ( Fig. 1A-C). To eliminate any false results from contaminating genomic DNA a specially designed tag was used in the RT-PCR reactions [21] (see Table 1). To cover the complete 14 kb sequence overlapping cDNAs of 2 kb were synthesized ( Fig. 1A-C). The four upstream ORFs asr0689, asr0690, alr0691, and alr0693, encode unknown or hypothetical proteins, and alr0692 is annotated as a gene encoding a protein similar to NifU. All proteins have conserved domains; asr0689 and asr0690 encodes possible ABC-transporters with membrane spanning regions, alr0691 encodes a protein with TPR (Tetratrico Peptide Repeats) and prenyltransferase like domains, alr0692 encodes a protein containing NifU and thioredoxin domains, and the protein product of alr0693 harbours NHL (NCL-1, HT2A and Lin-41) and TPR repeats ( Table  2).
The open reading frame asr0697, located between hypD and hypE, is annotated as encoding a probable 4oxalocrotonate tautomerase and has an orthologue in Anabaena variabilis ATCC 29413 with an amino acid sequence identity of 98.6%. The two ORFs asr0701 and alr0702, positioned downstream of the hyp-cluster, are encoding proteins with unknown function and as being a serine proteinase, respectively. asr0701 has homologues in both Nostoc sp. strain PCC 7120 and Anabaena variabilis ATCC 29413 where the encoded proteins share an amino acid sequence identity of 46% with the gene products of alr1571 and 44% with ava_4178 respectively. The encoded proteins of alr1571 and ava_4178 share an amino acid sequence identity of 99%. Orthologues of alr0702 are found in many other cyanobacterial strains, but not always directly downstream of the hyp-operon. The transcript levels of the genes in the extended hypoperon are low, since Northern blot analysis using specific probes within asr0689, alr0693, hypF or hypAB, failed to detect any mRNAs (data not shown). 5'RACE was performed to identify TSPs along the extended hyp-operon. Based on known TSPs in the hypgene cluster of Nostoc punctiforme PCC 73102 [22] the upstream regions of four genes, asr0689, alr0693, hypF and hypC, were examined. Three TSPs were identified ( Fig.  2A-C). One TSP was identified 74 bp upstream asr0689, with a putative σ 70 -like -10 box sequence (TAGAAT) and two putative NtcA-binding sites, centred around -28 bp (CTAATTTGATTGAC) and -113 bp (GTAGTTTTTTAGAC) with respect to the TSP. The second TSP was found 307 bp upstream hypF with a putative, extended -10 box (TGT-TAGGAT) [23]. A third TSP was identified 475 bp upstream hypC together with a putative σ 70 -like sequence,  asr0690 hypC asr0697 asr0701

Repeats and Palindromic hairpins
In BLAST searches of the complete 14 kb extended hypoperon, ten conserved short (11-23 nts) sequences (csR1-csR6) were identified in six intergenic regions (R1-R6) ( Table 3, Fig. 1). Each of the conserved sequences is widely distributed (14 to 79 times) within the genome of Nostoc sp. strain PCC 7120. In addition, one of the conserved sequences (csR4), located in the intergenic region between hypF and hypC, is also present 26 times on the alpha plasmid (Table 3). csR5.1/csR5.2 appear twice in the intergenic region of asr0701-alr0702. This sequence has previously been identified in Nostoc sp. strain PCC 7120 as a LTRR (Long Tandemly Repeated Repetitive sequence) [24]. Two of the intergenic regions, hupS-asr0689 and alr0693-hypF, harbour three and two different conserved sequences, respectively. Simulations suggest that some of the conserved sequences; csR2, csR3.1, csR4 an csR5, might form palindromic hairpins, with ∆G melting energies predicted of -1.3 to -10.5 kcal mol -1 . However, the conserved sequences csR1, csR3.2 and csR6, have no favourable energies for putative formations of secondary structures.

Transcription of the extended hyp-operon
This study demonstrates that the hyp-genes in Nostoc sp. strain PCC 7120 may be trancribed as a single operon which is in accordance with results from other cyanobacteria [22,25]. Furthermore, the five genes localised directly upstream and the two genes downstream of the hypoperon form transcripts with the hyp-genes ( Fig. 1A-C). In addition to the TSP positioned upstream of asr0689, two TSPs were identified upstream of hypF and hypC, respectively ( Fig. 2A-C), indicating that multiple short transcripts within the operon may exist. The position of the TSP upstream hypF is in agreement with results from Lyngbya majuscula CCAP 1446/4 [25]. In Nostoc punctiforme PCC 73102, no TSPs were detected in the upstream vicinity of either hypF or hypC. Instead, a TSP was identified upstream the NpR0363, an orthologue to alr0693, positioned upstream from hypF as in Nostoc sp. strain PCC 7120, with a putative NtcA-binding site and a -10 box [22]. No TSP was identified upstream alr0693 in Nostoc sp. strain PCC 7120. Transcripts of varying sizes derived  a The orthologue, alr1571, present in Nostoc sp. strain PCC 7120 has an amino acid sequence identity of 46% to asr0701 and an aa sequence identity of 99% compared with ava_4178.  . 1A and Fig. 4)) shown to bind to imperfect NtcA consensus sites. The position of the putative NtcA binding site located closer to the TSP than the more common 41.5 bp could indicate that the binding site would be compatible with NtcA acting as a repressor [23,29,30]. The second putative NtcA binding site (GTAGTTTTTTAGAC) centred -113 bp upstream the TSP has a perfect match to the consensus sequence (GT N10 AC). In the case of NtcA depending promoters a recognizable -35 box is usually missing [22]. The activity of NtcA will most probably be dependent of A TSP asr0689 asr0690 various growth parameters, and the presence of other regulating proteins and or metabolites interacting with NtcA. The promoter region upstream hypF contains an extended -10 box (TGNTAN3T) which belongs to a subclass of E. coli promoters which functions without a -35 box [23]. In the promoter region upstream hypC both a putative -35 and a -10 box have been identified. The presence of TSPs upstream from both hypF and hypC can be coupled to the specific function of the respective proteins. HypF is involved in the synthesis of the CN-ligands, while HypC and the downstream Hyp proteins are active in the insertion of the metal atoms into the active site and in the stabilization of the protein complex [6,13]. The promoter regions of hypF and hypC are both localized within respective upstream genes. This may indicate that they are individually expressed as a result of the need for more detailed regulation of the amount of the proteins translated from their respective mRNA.
There is not much known about the two ORFs, asr0689 and asr0690, except that they have putative membrane spanning regions and might function as ABC-transporters ( Table 2). The protein encoded by alr0691 contains TPR domains (Tetratrico Peptide Repeat), which have been shown to be involved in functions as chaperone in protein-protein interactions and assembly of multi-protein complexes [34][35][36]. A relevant feature of the protein encoded by alr0692 is that it harbors a NifU-like domain partly overlapping a thioredoxin-like domain. NifU-like proteins show a high degree of similarity between species as different as humans and viruses, which suggests that they are much conserved [37]. Thioredoxins participate in redox reactions catalysing the reduction of intra-molecular disulfide bonds (IPR005746), and can play a role as sulphur donor in the mobilization of sulphur for maturation of (FeS) clusters [18]. The protein encoded by the fifth ORF alr0693 contains domains with TPR and NHL (NCL-1, HT2A and Lin-41) repeats. NHL repeats could, according to structural model analysis, be involved in protein-protein interactions ( Table 2). The NHL domain is also associated with zinc finger motifs, which is often found in eukaryotes where they function as DNA binding motifs in transcription factors, by stabilizing a protein structure around the zinc atoms [38]. In Nostoc punctiforme sp. strain PCC 73102 NpR0363, the orthologue of alr0693, located in exactly the same position as in Nostoc sp. strain PCC 7120, is transcribed together with the hypgenes with a defined TSP and a promoter region putatively controlled by NtcA [22]. A suggestion is that this ORF might be involved in the maturation process of the large subunit of the uptake hydrogenase together with the hypgenes. The location is in accordance with the orthologue in Trichodesmium erythraeum IMS101, tery_0790, located directly upstream hypF and the hyp-genes, and has an identical arrangement as for the other orthologues of alr0693 (Fig. 3, Table 4).

Maturation of the small subunit of the uptake hydrogenase
A study of the legume endosymbiont Rhizobium leguminosarum bv. Viciae st. UPM791 demonstrated that a cluster of four genes, hupGHIJ, positioned between the structural genes and the hyp-operon is involved in the maturation of the small subunit of the uptake hydrogenase [39]. Especially hupG, which has a structural domain related to Physical map of the genomic arrangement of the structural hydrogenase genes, hupSL (depicted in light grey), the putative mat-uration genes of the small subunit of the uptake hydrogenase (dark grey), and hyp genes (black) of filamentous nitrogen-fixing cyanobacteria  … … thioredoxins and thiol-disulfide isomerases, and hupH which forms a complex structure with the pre-HupS seem to be important. HupH is thought to stabilize the protein complex as a chaperone during the maturation process and it has also domains characteristic of rubredoxins [39]. When using Blast-searches, no orthologues to the gene cluster hupGHIJ were found in the cyanobacterial genomes, but the two ORFs alr0691 and alr0692 contain conserved sequences encoding similar domains as present in HupH and HupG. Based on the finding that asr0689-alr0692 are transcribed together with the hyp-genes in Nostoc sp. strain PCC 7120 (Fig. 1), the existence of highly conserved orthologue regions in N 2 -fixing cyanobacteria positioned between hupSL, and the hyp genes, and that two of the ORFs alr0691 and alr0692, contain functional domains resembling hupH and hupG it is tempting to suggest that the upstream genes of the hyp-operon in Nostoc sp. strain PCC 7120 are involved in the assembly and maturation process of the cyanobacterial uptake hydrogenase small subunit. To prove if this hypothesis is true mutational studies followed by additional experiments will be done. Interestingly, orthologues to the gene cluster asr0689-alr0692 are absent in cyanobacteria harbouring only the bidirectional hydrogenase.

Repeats and Palindromic hairpins
In six of the intergenic regions of the extended hypoperon, a total of ten kinds of conserved sequences were found, appearing between 11 and 79 times in the genome ( Table 3). The conserved sequences that might form putative perfect palindromic structure (Fig. 4) could be involved in protein binding. The conserved sequence R5 (csR5) occurs twice in the same intergenic region (Fig. 4). Additionally, the intergenic region between hypF and hypC includes two csR4 sequences partly overlapping each other (ATTGCGAATTGCGAATTG). The conserved sequences are able to form putative hairpin secondary structures (Fig. 4) and are positioned between the transcriptional and the translational start point, which might indicate a function in translation. Another possibility is that these conserved sequences might have no functions at all and that they are results of evolutionary transposition events. To investigate the possible functions of the conserved sequences in the extended hyp-operon functional genomic experiments such as mutational studies will be performed.

Conclusion
This study demonstrated that five ORFs encoding proteins with unknown functions are co-transcribed with the hypgenes, and identified three TSPs, in Nostoc sp. strain PCC 7120 (Fig. 1). The additional conservation of these genes among N 2 -fixing cyanobacteria may indicate an important function, hypothetically in the maturation of the small subunit of the uptake hydrogenase.
The secondary structures formed by the repeated conserved sequences (cs) R2, R3.1, R4 and R5 found within the intergenic regions of the extended hyp-operon Figure 4 The secondary structures formed by the repeated conserved sequences (cs) R2, R3.1, R4 and R5 found within the intergenic regions of the extended hyp-operon. Predicted ∆G melting energies are shown for each putative hairpin structure. Conserved sequence R5 (csR5) is identical to the previously described LTRR in Nostoc sp. strain PCC 7120 [24]. csR3.1 csR4 csR5 G G G G