The protein interaction map of bacteriophage lambda

Background Bacteriophage lambda is a model phage for most other dsDNA phages and has been studied for over 60 years. Although it is probably the best-characterized phage there are still about 20 poorly understood open reading frames in its 48-kb genome. For a complete understanding we need to know all interactions among its proteins. We have manually curated the lambda literature and compiled a total of 33 interactions that have been found among lambda proteins. We set out to find out how many protein-protein interactions remain to be found in this phage. Results In order to map lambda's interactions, we have cloned 68 out of 73 lambda open reading frames (the "ORFeome") into Gateway vectors and systematically tested all proteins for interactions using exhaustive array-based yeast two-hybrid screens. These screens identified 97 interactions. We found 16 out of 30 previously published interactions (53%). We have also found at least 18 new plausible interactions among functionally related proteins. All previously found and new interactions are combined into structural and network models of phage lambda. Conclusions Phage lambda serves as a benchmark for future studies of protein interactions among phage, viruses in general, or large protein assemblies. We conclude that we could not find all the known interactions because they require chaperones, post-translational modifications, or multiple proteins for their interactions. The lambda protein network connects 12 proteins of unknown function with well characterized proteins, which should shed light on the functional associations of these uncharacterized proteins.


Background
Sixty years ago, in 1951, Esther Lederberg discovered phage lambda [1]. Since this seminal discovery lambda has become a model organism in which many foundational studies lead to our current understanding of how genes work and how they are regulated, as well as how proteins perform such functions as DNA replication, homologous and site-specific recombination, and virion assembly. In addition, tailed phages are the most abundant life form on earth [2], and so deserve to be studied in their own right and in the context of global ecology. Nevertheless, phage lambda is not completely understood. There are still a number of genes in its 48.5 kb genome whose function remains only vaguely defined, if at all. For instance, many of the genes in the b2 and nin regions have no known function ( Figure 1). And 14 of the 73 predicted lambda proteins have unknown functions.
Two of the best-characterized aspects of lambda biology are the genetic switch that determines whether a phage reproduces and lyses the cell or whether it integrates into its host genomes to become a prophage [3,4] and the mechanisms through which transcription antitermination controls its gene expression cascade. Nevertheless, lambda continues to yield new insights into its gene regulatory circuits [4,5], and recent studies of its DNA packaging motor are in the vanguard of nanomotor research [6].
Surprisingly, even the structure of the lambda virion is incompletely known: the structures of only 5 of the~14 proteins in the virus particle have been solved, and it is unknown whether several proteins that are required for tail assembly are in the completed virion, even though the overall structure is well known from electron microscopy [7].
Key to the understanding of lambda biology is a detailed understanding of protein function, including their interactions. We have curated more than 30 protein-protein interactions (PPIs) from the literature, identified over the past 60 years. Such interactions are reasonably well known within the virus particle and during the life cycle of lambda, i.e. during replication and recombination. However, the molecular details of virion assembly, obviously highly dependent on coordinated interactions of structural and accessory proteins, are still largely mysterious.
The structures of at least 17 lambda proteins have been solved (Table 1). In addition, the lambda head has been studied in some detail by cryo-electron microscopy, X-ray crystallography, and NMR ( Figure 1). The tail is much less well known. While we do have structures of the head-tail junction proteins W, FII, and U individually, their connections to the head via the portal protein (B) and to each other are not very clear. Similarly, while we do have a structure of the major tail tube protein V, the remaining tail is structurally largely uncharacterized.
Our motivation for this study was three-fold: first, in our continuous attempts to improve the yeast twohybrid system further, we thought that phage lambda would be an excellent "gold-standard" to benchmark our experimental system by demonstrating how many previously known interactions ( Table 2) we are able to identify in such a well-studied system. Second, we believe that interaction data can help to solve the structures of protein complexes, since binary interactions as described here may facilitate the crystallization of cocomplexes. Despite its well-understood biology, phage lambda is not well understood structurally; especially   (Table  1). Numbers indicate the number of protein copies in the particle. It is unclear whether M and L proteins are in the final particle or only required for assembly. (C) Electron micrograph of phage lambda. (A) and (C) modified after [24].
the assembly of its tail remains poorly understood. Third, and possibly most important, we wondered if we could contribute to the understanding of lambda biology, either by discovering new interactions or by verifying questionable or poorly supported interactions. To achieve these goals, we cloned almost all lambda open reading frames (ORFs) and tested them for all pair-wise interactions, using a novel yeast two-hybrid strategy [8]. We identified a total of 97 unique interactions, most of which have not been previously described. About half of all published interactions were identified, and we will discuss why the other half has been missed and how these interactions might be detected by future two-hybrid studies.

Approach
In order to find as many interactions as possible, we cloned 68 lambda ORFs into six different Y2H vectors (see Table 3 and Methods). In fact, each vector pair results in very different subsets of interactions as we have shown previously [8][9][10]. For example, the pGADT7g/pGBKT7g vectors yielded 44 interactions while the pGBKCg/pGADCg vectors yielded only 18. The main difference between these two pairs is the way the fusion proteins are constructed: in the former two vectors the Gal4 DNA-binding (DBD) and activation domains (AD) are fused to the N-terminus of the lambda proteins ( Figure 2). In the latter two the DBD and AD are fused to the C-terminus of the lambda proteins. It is thus reasonable to assume that structural constraints cause many of the observed differences.

Assay sensitivity and false positives
As we have observed before in other contexts [10], the pGADT7g/pGBKT7g vectors yielded almost half of all interactions discovered in this study and almost three times as many as the pDEST series of vectors (which uses similar N-terminal fusions). The pDEST system may detect fewer interactions but they probably also detect fewer false positives (see discussion).

Verification and quality scores
If an interaction is found in more than one vector combination, the reliability is higher than when it is found in only one. Twenty-four interactions (out of 97) were found in 2 or more vector combinations (Table 4). This number of combinations can be used as a score, and the 3 interactions with the highest score have all been described in the literature before. Of the 24 high-scoring interactions, six (25%) have been described before (Figure 2D). To test if the difference of the proportions of detected literature interactions is greater for the more than one vector combination group, we carried out a one-sided test for difference of proportions. The null hypothesis can be rejected for alpha = 0.1 indicating a moderately significant difference (P-Value = 0.098) (Additional file 1: Table S6). We conclude that the number of supporting vector combinations can be used as a confidence score. This suggests that the 18 novel high-scoring interactions are possibly physiologically relevant interactions and thus good candidates for further studies (see discussion).
Of the 73 interactions that were found in only one combination, 10 have been published previously, demonstrating that they are useful too. In fact, 16 out of 30 previously found interactions were also found in our screen, i.e. 53%. Note that three previously found interactions (Xis-Xis, Xis-Int, and SieB-Esc) could not be tested since we were unable to obtain ORF clones of J, Xis, NinH, and Esc (which is encoded within SieB).

Prey counts
There are other criteria that can be used to score interactions. One of them is the number of times a prey protein is found. This "prey count" indicates whether a protein interacts very specifically (low prey count) or more unspecifically and thus promiscously. Proteins with high prey counts are more likely false positives, and hence we removed these interactions with prey count > 5 from further analysis (see Additional file 1: Tables S2 and S3). However, this was not generally true in our study: of the preys that were found 1 to 3 times, 12 were found among the "gold-standard" literature interactions. Of the preys that were found 4 to 5 times, 9 were involved in such gold-standard interactions (5 interactions were shared in both groups).

Protein coverage
Among the 73 lambda proteins listed in the Uniprot database (J02459), 51 were found to be involved in interactions (Figure 3), which represents 70% of the proteome. 15 proteins were found only in one interaction (CIII, Ea10, Ea59, Exo, FII, Kil, L, Nu3, Orf64, Orf60a, R, Rz, T, W, and Xis) but 7 proteins were found to be involved in 10 or more interactions (namely U, Bet, Ea8.5, Nu1, A, Int, and G). Hence the former are more specific and latter more promiscous and thus less reliable. Interestingly, several proteins were conspicuously absent from our list of interactions, primarily proteins of head and tail assembly (B, C, I, J, Stf, and Tfa) as well Table 3 Vectors and interaction summary    All the interactions obtained from the array screening were subjected to Y2H retests: we were able to retest all the interactions shown in Figure  2 except A-Ea47, which has thus been removed from the final interaction list. Technical details of the screening procedure have been described in [8,10]. (C) Interaction quality assesment. Using the experimental derived false positive rate from [9] and Bayes theorem, we estimated the probability of an interaction to be true. This estimate depends on the vector system, being highest (83%) for pDEST22/32, and lowest (40%) for pGBKCg/pGADT7g. as the poorly understood proteins NinG, NinH, Orf221 (NinI), Orf290 (NinC), and SieB (see discussion).

Functional specificity
We grouped all lambda proteins in 9 groups, namely virion head, virion tail, transcription, replication, recombination, lysis, lysogenic conversion, others with known function, and unknown (Table 4). A statistical analysis of interactions shows that proteins involved in head assembly have the highest specificity ( Figure 4): when interactions among different functional classes are considered, the proteins involved in capsid assembly tend to interact with themselves more frequently compared to other functional classes. Interestingly, the proteins of unknown function show interactions with proteins involved in several functional classes, including tail assembly, transcription and recombination ( Figure 4). Overall, the 97 protein-protein interactions (PPIs) of our screens correspond to~4.2% of the lambda search space (= 97/68*68*0.5), i.e. all possible protein pairs of the lambda proteome (here: 68*68). This is significantly less than we found in Streptococcus phage Dp1, namely 156 interactions among 72 ORFs [11] even though in the latter case only 2 vector pairs were used. A possible explanation is that we used a more rigorous retesting scheme here in which only interactions were counted that were found in multiple rounds of retesting.

Lambda protein interaction network
This is only the second study that has applied multiple two-hybrid vector systems to characterize the proteinprotein interactions at a genome scale, the first being our analysis of the Varicella Zoster Virus [8]. The lambda protein network connects 12 proteins of unknown function with well characterized proteins, which should shed light on the functional associations of these uncharacterized proteins (Figure 3). For example, NinI interacts with two proteins N and Q which are involved in transcription antitermination. The scaffolding protein Nu3 forms dimers, and interacts with the tail proteins Z and M as well as the capsid protein E. Thus, Nu3 may play an accessory role in the assembly of both head and tail, even though Nu3 is not absolutely required for tail assembly.

False negatives
This study discovered more than 53% of all published interactions among lambda proteins. However, it failed to discover the remaining 47%. We can only speculate why this is the case. Some of the early steps in virion assembly depend on chaperones [12]. For instance, the portal protein B requires GroES/EL, most likely for folding [13]. These chaperones are not present in the yeast cells which we used for our interaction screens. We found only one of five known interactions of B (namely W-B) and aberrant folding in yeast may be the reason for not detecting the other four known interactions. In addition, several lambda proteins are processed during assembly. For instance, the C protease is processed and covalently linked to the capsid protein E. This fusion protein is then further processed to yield products named "X1" and "X2" even though recent attempts to identify X1 and X2 were unsuccessful and thus X1 and X2 may be artifacts [14]. A 21 amino acid peptide is also proteolytically removed from the portal protein B but it is not known how this affects its interaction properties. Finally, protein S, which forms a membrane protein involved in lysis, is made in two variants that use different start codons. In fact, we do find that the shorter variant, S' (105 amino acids) has a slightly different interaction pattern compared to the full-length variant, S (107 amino acids) ( Figure 3). We have not investigated the detailed mechanism of these differences but it has been shown in several studies that fragments of proteins show different interaction patterns than their full-length proteins [15,16] even though this is an extreme case given the small difference between S and S'. While sterical hindrance may be an obvious reason for this behavior, little is known about the mechanistic details in most other published cases. False negatives may also be a result of the obligate stepwise assembly of large protein structures in lambda and other phage, e.g. when a conformational change due to interaction between two proteins creates a new binding site for a third protein. For instance, in phage T7 only the heterodimer of gp5 and the host thioredoxin provides a binding site for the single-stranded-binding protein (SSB = gp2.5) and the primase-helicase gp4 [17]. Such cases can only be detected if all three proteins were expressed simultaneously and the constructs involved allowed the formation of complex oligomers.

False positives
While we found only 53% of all previously known interactions of lambda, we also found many new ones (Table  4). However, many of the new interactions have only been found once and hence are lower confidence interactions. On the other hand, nine of the previously published interactions were found only once in our screen but are nevertheless well-known interactions. In order to verify the biological significance of new interactions further criteria or experiments are required. One criterion often used is the plausibility of an interaction: if two interacting proteins belong to the same functional group, they are likely physiological. 34 of the 97 interactions (34%) take place within their functional group, including the 16 known ones. Some of the remaining Bfun = bait protein function, Pfun = prey protein function group (rec = recombination, repl = replication, trx = transcription, conv = lysogenic conversion, ihr -inhibition of host replication [76]). NN, CN, NC, CC indicated the fusion type of the bait and prey proteins (see text). The two NN vectors are indicated by G (pGBK/pGAD) and D (pDEST22/32). Interaction that have been found in inverted prey-bait combinations are indicated by a prime sign ('). Interactions that have been found in both bait-prey and prey-bait orientations are indicated by bold and primes (e.g. NC'), respectively. Interactions without any note are unexpected and may not be physiologically relevant. 2v = interactions found with 2 vector pairs. Stf = Orf314.
interactions are discussed below in the context of their functional group. Some proteins appear to be particularly "sticky". For example, G, a tail protein, is involved in 8 different twohybrid interactions. The specificity of such interactions is inversely proportional to the number of such interactions; thus, G likely interacts rather unspecifically, and its interactions have to be interpreted cautiously. Similarly, Int, A, Nu1, and U are involved in 8 or more two-hybrid interactions each, and thus each interaction has to be evaluated individually keeping in mind its promiscuity. We have attempted a careful manual evaluation in Table 4.
The reason for interaction promiscuity and thus false positives remains unclear. Several hypotheses have been proposed to explain such cases. For example, a protein may have hydrophobic patches that interact unspecifically. Some authors have suggested that simply an increase in abundance might cause a promiscuous gain of interactions [18] but such theories remain to be tested rigorously. The Y2H assay appears to be sensitive enough to detect weak interactions that are not detectable in NMR experiments, e.g. the interaction between U monomers [19]. The high sensitivity may also explain a significant number of false positives which may have been detected in our screen but which do not have any physiological significance. Future quantitative measurements are thus required to clarify the relationship between affinity and physiological relevance.

Head assembly and structure
The structure of the lambda protein shell is known in great detail [20]. However, its assembly is much less well understood as are the locations and functions of the "minor" proteins that are present in only a few molecules/virion ( Figure 5). The portal protein B is believed to be the nucleator or initiator of head assembly, which first assembles with the C protease and with the scaffolding protein Nu3 into an ill-defined initiator structure. B, C, and Nu3 are known to form a complex in which several interactions have been previously reported (C'-B, C-Nu3, Nu3-Nu3, and Nu3-B, Table 2). We could not detect B in any interaction although we did find Nu3-C, Nu3-Nu3 and Nu3 interactions with E and Z. This is noteworthy because Nu3-E and Nu3-Z are new interactions. It is known that E (the major capsid protein) assembles onto or around the initiator structure to form the procapsid [12], and it is conveivable that B joins such an assembly. If Nu3 and C proteins are both required for B to join, we would have missed this interaction, given that we tested only pairs of proteins. Nu3 also appears to form dimers by the Y2H analysis, and this has been confirmed independently (C. Catalano, pers. comm.).
The head shell is bound by the D protein which stabilizes the coat protein shell. However, if Nu1, A, or FI are missing, DNA is not packaged and as a consequence, the coat shell does not expand, and D can only add after expansion. We could confirm the A-Nu1 interaction as well as the interactions between FI and A and FI and E which were previously known only from genetic experiments [21,22]. We also confirmed the D-E and E-E interactions. The terminase and the portal proteins are the largest proteins of the lambda head. Using fragments of these proteins as baits -as opposed to full-length proteinsmay result in additional interactions, especially since we were not able to detect most of the B interactions reported in the literature (Tables 2 and 4).

Tail assembly and structure
Tail assembly is even less well understood than head assembly ( Figure 6). From genetic analyses it is known that the host receptor protein J initiates the process with I, L, K, and G (including its fusion protein G-T) successively joining the process [23]. Older studies suggest a slightly different order of action, namely J > I > K > L [24]. In fact, it is not known if I, L and M are components of the finished virion or are assembly factors that are not present in virions. It is thus difficult to reconstruct the detailed molecular events during tail assembly. In any case, J eventually associates with the tape measure protein H, and the major tail protein V forms a tube around this central rod. U finally joins the head-proximal part of the tail. Similarly, W and FII join to the portal protein in the head to form the binding site for the tail. The main tail proteins are connected by known direct protein-protein interactions (Table 2) but the interactions during the initiation of tail assembly have eluded previous studies. In fact, we failed to detect any interaction involving J and I, and the only interactions of L and K did not involve other tail proteins (Table 4). However, we did find several new interactions that are potentially relevant for tail assembly. For instance, G, a fairly promiscous protein with a total of 8 interactions, was found to bind to V, G, T, H, and M. It is thus possible that it acts as a scaffold organizing the assembly of the tail. By contrast, the interactions of H and V with G were their sole tail-related interactions. We did not find the tail fiber proteins Stf and Tfa to interact with other tail proteins in our screens. Stf has been speculated to assume a trimeric structure, similar to the tail fiber protein of phage T4 [25] although there is no specific evidence for oligomerization in lambda.
In summary, it is surprising that we found so many virion protein interactions, given that virion assembly is an obligately ordered pathway and most binding sites may be only present in the growing virion and not on individual unassembled proteins.

Transcription
The genetic switch leading to a decision between lysogeny and lysis has made lambda a prime model system Head assembly. Head assembly has been subdivided in five steps although most steps are not very well understood in mechanistic terms. The tail is assembled independently. The C protease, the scaffolding protein Nu3, and the portal protein (B) form an ill-defined initiator structure. Protein E joins this complex but the chaperonins GroES and GroEL are required for that step. Within the prohead C and E are processed to form covalently joined X1 and X2 proteins although this is controversial (see text). Proteins Nu1, A, and FI are required for DNA packaging. Protein D joins and stabilizes the capsid as a structural protein. FII and W are connecting the head to the tail that joins once the head is completed. Modified after [12] and [20].
for transcriptional regulation. A significant fraction of lambda literature has been devoted to this question [3].
Here, we ignore the interactions of transcription factors with DNA and concentrate on their interactions among each other and the transcriptional machinery. Several factors form dimers (Cro, CI, CII, CIII). Of these, we could only confirm the CII self-interaction. CI, CII, and CIII all interact with various components of the virion in our two-hybrid studies, especially of the tail. However, whether these interactions are physiologically relevant is questionable. Notably, the antiterminators N and Q also show a number of interactions in our tests although none of these involve any other transcriptional regulators. Also, all of these interactions were found in a single vector combination, so they are not as well supported as other interactions.

Recombination, integration, and excision
Integration of the lambda genome into the host chromosome is part of the establishment of the lysogenic state. Integrase (Int), assisted by the integration host factor (IHF) catalyzes this reaction. Similarly, integrase (Int), this time assisted by excisionase (Xis) and the host Fis protein, catalyzes the excision of the lambda prophage. Three other lambda proteins are known to be involved in homologous recombination: Exo (exonuclease), Bet (= β, strand annealing protein), Gam (an anti-recBCD protein), and NinB (which can replace the recFOR complex which can load RecA onto ssDNA covered with single-stranded DNA-binding (SSB) protein [26]). We did not find the known interaction between Bet and Exo. In fact, we found Int and Bet to both homodimerize, and Bet and Int to interact. This indicates that these proteins may assist Int. A number of other interactions involving these recombination proteins and unrelated gene products are difficult to explain and require further analysis. However, they may implicate several uncharacterized small ORFs in the process of recombination (Table 4).

Host interactions
At least 15 lambda proteins interact with host proteins (S. Blasche, S.V. Rajagopala & P. Uetz, unpublished data). Lambda critically depends on host factors for integration, transcription, excision and virion assembly. Hence, a detailed understanding of lambda biology depends on information about such host-phage interactions. These interactions are beyond the scope of this study. We will address this issue in a forthcoming paper.  . Assembly starts with protein J, which then, in a poorly characterized fashion, recruits proteins I, L, K, and G/T to add the tape measure protein H. G and G/T then leave the complex so that the main tail protein (V) can assemble on the J/H scaffold. Finally, U is added to the headproximal end of the tail. Protein Z is required to connect the tail to the pre-assembled head. Protein H is cleaved between the action of U and Z [31]. It remains unclear if proteins M and L are part of the final particle [24]. Modified after [23].

Protein networks and functional genomics of phage lambda
soon and integrate the resulting protein-protein interactions into a systems biology model of lambda biology.

Conclusions
Using phage lambda as a benchmark we showed that we can find about 50% of the interactions among its proteins using Y2H screens. No other technology has been able to detect such a large fraction of interactions in a single macromolecular assembly (except crystallization of whole complexes, which is not possible with phage particles). We thus predict that our strategy can find roughly half of all interactions in other phage and protein complexes. However, other methods will be required to find interactions that require chaperones, post-translational modifications, or other additional factors that could not be provided in our assay.

Cloning the phage lambda ORFs into Gateway entry vector
The DNA sequence of phage lambda was obtained from the NCBI genomes database (NC_001416) and primers were designed, using the Primer Design Tool [29]. The primers were designed without endogenous stop codons.
In addition to the 20-to 30-nucleotide-long ORF-specific sequence the attB1 segment (5'-aaaaagcaggctta-3') was added to each forward primer, followed by ORFspecific bases. The attB2 segment (5'-agaaagctgggtg-3') was added at the 5' end of each reverse primer, which was complementary to the end of the ORF, without the last nucleotides of the stop codon.

PCR amplification and cloning of lambda ORFs into gateway entry vector
All the ORFs of phage lambda were PCR amplified using KOD DNA polymerase (Novagen), and phage lambda genomic DNA (NEB:N3011L). The complete sequences of attB1 (5'-GGGGACAAGTTTGTA-CAAAAAAGCAGGCT-3') and attB2 (5'-GGGGAC-CACTTTGTACAAGAAAGCTGGGT-3') were added in the secondary round PCR, where the first round PCR product was used as a template, to generate the fulllength attB1 and attB2 sites flanking the ORFs. The PCR cycles were used as recommended by the KOD DNA polymerase manufacturer (Novagen, Cat. No.710853).
The PCR-amplified ORFs with attB1 and attB2 sites were recombined into the entry vector pDONR™/Zeo (Invitrogen) by using the BP Clonase™ II Enzyme Mix (Invitrogen). The products resulting from site-specific recombination were transformed into chemically competent E. coli (DH5-α) and plated onto solid LB medium containing Zeocin. Two isolated colonies were selected for each reaction and the clones were verified by colony-PCR with pDONR™/Zeo-specific primers. The clones that had an insert of the expected size were picked for plasmid isolation and the plasmid preparations were sequenced with a pDONR™/Zeo-specific forward and reverse primers to verify the insert from both N-terminal and C-terminal ends of the ORFs. All the sequencing reads were analyzed using NCBI standalone BLAST against the phage lambda genome to confirm the identity of each ORF. We obtained 68 entry clones out of 73 targeted lambda ORFs (see Additional file 1: Table S1).

Yeast two-hybrid screening
We carried out comprehensive Y2H interaction screening with the following Y2H vector pairs: pDEST32-pDEST22, pGBKT7g-pGADT7g, pGBKT7g-pGADCg, pGBKCg-pGADCg and pGBKCg-pGADT7g (listed as bait-prey vector pair). In the array screening we tested each protein both as activation (prey) and DNA-binding domain fusion (bait), including C-terminal fusions in pGBKCg and pGADCg. This way, we tested each protein pair in ten different configurations ( Figure 2). The yeast two-hybrid assays were conducted as described in detail by Rajagopala et al. [10,30].

Data availability
The protein interactions from this publication have been submitted to the IMEx http://www.imexconsortium.org consortium through IntAct http://www.ebi.ac.uk/intact/ and assigned the identifier IM-15871.