Classification of Myoviridae bacteriophages using protein sequence similarity

Background We advocate unifying classical and genomic classification of bacteriophages by integration of proteomic data and physicochemical parameters. Our previous application of this approach to the entirely sequenced members of the Podoviridae fully supported the current phage classification of the International Committee on Taxonomy of Viruses (ICTV). It appears that horizontal gene transfer generally does not totally obliterate evolutionary relationships between phages. Results CoreGenes/CoreExtractor proteome comparison techniques applied to 102 Myoviridae suggest the establishment of three subfamilies (Peduovirinae, Teequatrovirinae, the Spounavirinae) and eight new independent genera (Bcep781, BcepMu, FelixO1, HAP1, Bzx1, PB1, phiCD119, and phiKZ-like viruses). The Peduovirinae subfamily, derived from the P2-related phages, is composed of two distinct genera: the "P2-like viruses", and the "HP1-like viruses". At present, the more complex Teequatrovirinae subfamily has two genera, the "T4-like" and "KVP40-like viruses". In the genus "T4-like viruses" proper, four groups sharing >70% proteins are distinguished: T4-type, 44RR-type, RB43-type, and RB49-type viruses. The Spounavirinae contain the "SPO1-"and "Twort-like viruses." Conclusion The hierarchical clustering of these groupings provide biologically significant subdivisions, which are consistent with our previous analysis of the Podoviridae.


Background
We recently described methods aimed at unifying classical and genomic classification of bacteriophages by integration of protein sequence data and physicochemical parameters. We developed two protein sequence similarity-based tools, CoreExtractor and CoreGenes [1], to parse-out and quantify relationships between pairs of phages resulting in a single correlation score [2]. This analysis is followed by a deconstruction and literature analysis of the known morphological and physicochemical characteristics of these phages. The biological interpretation of molecular correlations between 55 fully sequenced Podoviridae show that this approach agrees with the current phage classification of the International Committee on Taxonomy of Viruses (ICTV) and suggests that, generally, horizontal gene transfer only partially masks evolutionary relationships between phages. Using a cut-off value of 40% homologous proteins, we verified relationships between phages known to be similar and identified several new bacteriophage genera. At the 20-30% homology level, we identified relationships of a higher order justifying the introduction of the subfamily taxonomical category.
The Myoviridae in the VIIIth ICTV Report comprise five genera of bacteriophages (Mu, P1, P2, SPO1, and T4-like viruses) and one genus of archeal viruses, phiH. I3 and phiKZ-like phages have been recently proposed as additional genera http://www.ncbi.nlm.nih.gov/ICTVdb/Ictv/ fs_myovi.htm. These genera include only a small fraction of presently known myoviruses with fully sequenced genomes [3]. We analyze and interpret here the correlations between 102 Myoviridae genomes found in the National Center for Biotechnology Information (NCBI) and the Tulane University T4 Genome databases. Figure 1 shows the correlation, based on the CoreExtractor distance measure, among all available Myoviridae genomes in the NCBI databases. To verify and more subtly compare individual correlations, the CoreGenes approach was applied to subsets of related phages, including several genomes not currently available in public databases ( Table 1). As in previous analyses of the Podoviridae [2], threshold values of 40% and 20% (and 0.6 and 0.8 relative dissimilarity, respectively) of homologous proteins strongly suggest genus and subfamily boundaries, respectively (Additional file 1). They are corroborated by morphological, molecular or physiological data and discussed in the paragraphs below.

A. Myoviridae Subfamilies I. Teequatrovirinae 1. T4-like viruses nova comb
The ICTV currently lists only six sequenced viruses as members of the T4 phage genus, namely enterobacterial phage T4, Acinetobacter phage 133, Aeromonas phages Aeh1, 65 and 44RR2.8t, and Vibrio phage nt-1. However, the scientific literature and public databases abound with descriptions of "T4-like" phages and the analysis of complete genome sequences indicates that the T4-related phages constitute one of the largest groups of bacterial viruses. This corroborates ecogenomic studies on the diversity of these viruses as apparent in the heterogeneity of capsid (gp23) genes in isolates from Japanese rice fields [4], marine systems [5,6], and from Lithuania [7], Bangladesh and Switzerland [8]. These studies suggest that the fully sequenced T4 phages are but a small fraction of the T4-related genomes in nature. Nevertheless, there are clear commonalities among all sequenced "T4-like" genomes from different host groups, including the cyanophages, namely a set of [33][34][35] genes that have persisted during the evolution of genomes with sizes from 160 to 250 kb [9]. This core of genes seems to have resisted divergence throughout evolution. Nevertheless, these horizontal substitutions do not erase the evidence of the global relationship between phages and clear hybrid phages within this group have not been identified to date [10,11]. Work done at Tulane University [10,11], led to the tentative conclusion that it takes about 33 T4 genes to determine a genetic program that controls lytic phage development in the host cell.
Based on the Myoviridae cluster dendrogram (Figure 1), the current ICTV genus "T4-like viruses" can be subdivided into two genera and several subgroups. By analogy to the T7-related podoviruses, now named the Autographivirinae, the former ICTV genus was raised to the rank of a subfamily, the Teequatrovirinae, named after the best-studied of these phages, coliphage T4. The first genus, the "T4like viruses", includes what were previously termed the Teven and "pseudo-T-even" phages [12,13]. Our name perpetuates the old ICTV nomenclature, but is now limited to enterobacterial and Aeromonas phages. The KVP40 phages, consisting of two former members of the "schizo-T-evens" [14] form the other genus.

PRELIMINARY GROUPINGS AND UNRELATED PHAGES (cyanomyoviridae)
Synechococcus S-PM2 NC_006820 100 100.0  subdivided by the presence of specific encoded proteins as outlined in Table 2. In the subtype T4 phages, three specific proteins with defined functions (Pin, MotB, ModA) were found. Pin is an inhibitor of the host's Lon protease [15,16], while the other two proteins function to modulate transcription [17,18].
Heteroduplex analyses indicate that coliphages T2, T4 and T6 share >85% sequence similarity [19], warranting their inclusion, in spite of lack of detailed sequence data for T2 and T6, into the T4-type subgroup. The DNA of the T-even phages contains 5-hydroxymethylcytosine (5-HMC). While this modified nucleotide is common in T4-related phages [20], its presence has not been ascertained biochemically in the other phages (JS98, RB14, RB32, RB69) included in this subgroup. T4 gp42 dCMP hydroxymethylase and Alc that blocks transcription from cytosine containing DNA are required for the incorporation of 5-HMC rather than cytosine into T-even DNA. Genes specifying homologs of the T4 gp42 and Alc proteins are also present in the 44RR2.8t-type phages.

KVP40-like viruses
The KVP40 viruses comprise two marine vibriophages, KVP40 and nt-1, with genomes of approximately 246 kb. KVP40 infects Vibrio parahaemolytius and was isolated from seawater. Phage nt-1 infects Vibrio natriegens and originates from a coastal marsh. The phages differ from T4 in head length (137 nm vs. 111 nm), but are identical to phage T4 in tail morphology. KVP40 has a feather of decoration proteins on its head [21,22].
Three other T4 phages do not fit into these groups: Acinetobacter phage 133, Aeromonas hydrophila phage Aeh1 and Aeromonas salmonicida phage 65. Morphologically, phage 133 is identical to T4, whereas Aeh1 and 65 have the same heads of 133 nm in length as Vibrio phages KVP40 and nt-1. They were considered to be part of the "schizo-T-even" group [23] and have a T4-type tail structure [20]. CoreGenes and our supplementary phylogenetical analyses indicate that these phages are too dissimilar, by our criteria, to be included into one of the genera listed above.
The four marine cyanophages (P-SSM2, P-SSM4, S-PM2 and Syn9) infect Synechococcus or Prochlorococcus strains and harbor T4 genes causing this group to be named the "exo-T-evens" [24,25]. These phages have isometric heads and much longer tails than T4. CoreGenes analysis indicates that they form a group sharing >40% proteins in common. While P-SSM2, P-SSM4 and Syn9 share 93 proteins, they show considerable dissimilarity in appearance, size, and DNA content (Table 3). Phylogenetic analysis based upon sequence alignments of gp20 (portal vertex protein [26]) and photosystem II protein D1 [27,28] indicate considerable diversity exist among cultured and environmental cyanophages. This is also confirmed by an analysis of data from the marine virome from the Sorcerer II Global Ocean Sampling expedition [29]. Based upon these observations, we feel that the creation of genera within cyanophage myoviruses is premature at the present time.
Rhodothermus marinus phage RM378 (NC_004735) is a virus said to have a head of 95 × 85 nm and a tail of 150 nm in length [30]. It was called a "ThermoT-even phage" by Filée et al. [6], but our CoreGenes analysis reveals that its proteins shows minimal sequence similarity to any T4related virus.
Hierarchical cluster dendrogram of the analyzed Myoviridae Figure 1 Hierarchical cluster dendrogram of the analyzed Myoviridae. The relative dissimilarity between the phage proteomes (between 0.0 and 1.0) forms the basis for the proposed groupings. The dotted lines reflects the cut-off value used for the establishment of genera, used consistently for all Myoviridae and the previously defined Podoviridae [107]. Subfamily and tentative subfamily groupings are indicated in the grey and dotted boxes, respectively.

II. Peduovirinae
This subfamily is a large phage group derived from the ICTV genus "P2-like phages" and is named the Peduovirinae. Virions have heads of 60 nm in diameter and tails of 135 × 18 nm. Phages are easily identified because contracted sheaths tend to slide off the tail core. The subfamily falls into three different groups. As shown by CoreExtractor and CoreGenes analyses, and using the 40% similarity criterion for inclusion into the same genus, phage HP1 has only 9 genes in common P2. Even if other P2 phages are considered, HP1 shares only 17 genes with any phage of the "P2-like" genus. Using the 40% similarity criterion for inclusion into the same genus, it is therefore justified to consider P2 and HP1 as members of different genera and to upgrade the present genus "P2 phages" to a subfamily.

P2-like viruses nova comb
This genus includes P2 itself and its extensively studied relative, coliphage 186. Both originate from the Pasteur Institute in Paris, France. Phage P2 is one of three phages (P1, P2, P3) isolated by G. Bertani in the beginning of the 1950's from the "Li" (Lisbonne and Carrère) strain of E. coli [31]. Later on, F. Jacob and E. Wollman isolated phage 186 and many other viruses from enterobacteria collected by L. Le Minor [32]. The reason for the early interest in these phages was that P2 and 186 are temperate. The analysis of the genetic control of these two modes was the starting point for ongoing fertile research on phage biology and molecular biology in general.
The genomes of phage P2 and 186 were the first P2 genomes to be fully sequenced and analyzed. Almost all P2 and 186 genes have been assigned a function [33][34][35].
Coliphages W and L-413C are very similar to P2 in both gene content and gene order. They are closely related to each other, sharing all but one protein. The only genes of these phages that differ from P2 are the lysogeny-related genes, which may have been horizontally acquired and are totally different, but have been inserted at the same locations into all genomes. The only exception to this is that phage P2 has a 786 bp ORF (orf30) with unknown function inserted between the S and V genes. There is no such insertion in W and L-413C, but Pseudomonas phage CTX (see below) has another uncharacterized ORF located at this position. Enterobacterial phages 186, PSP3, Fels-2, and SopE also share their overall gene order and many genes with P2, but the genes are more diverged. Unlike P2, these phages are UV-inducible due to the presence of the tum gene. In addition, they have a different lysis-lysogeny switch region. P2 phages seem to have either of two different proteins for repression of the lytic cycle. P2, W and L-413C have the repressor gene C whereas 186, PSP3, Fels-2, SopE, HP1, HP2, and K139 (below) instead have the sequence-unrelated genes CI and CII, both of which are equally needed for establishing lysogeny.
Mannheimia phage -MhaA1-PHL101, Pseudomonas phageCTX, and Ralstonia phage RSA1 have many P2 genes and an overall order of structural genes that is P2like, although interspersed with some uncharacterized genes. Their presumed regulatory gene regions include additional putative and uncharacterized ORFs. Phage CTX has only the P2 regulatory gene ogr (transcriptional activator of the late genes) and the recombination enzyme int (integrase), -MhaA1-PHL101 has repressor (CI) and antirepressor (Cro) equivalents which are most closely related to the regulatory proteins of the P22-like enterobacteria phage ST104 than to P2.  Phage RSA1 seems to have only one P2-related regulatory gene, the ogr gene, although it is more related to the Ogr/ Delta-like gene in CTX. The RSA1 integrase is more similar to the integrases of the P2-like Burkholderia phages (E202, 52237, and E12-2 and P22-like viruses.

HP1-like viruses
The genome architecture of HP1 [36] and its close relative, HP2, resembles that of P2 although their cos sites, as with Pseudomonas CTX [37], are located next to attP rather than downstream of the portal protein-encoding gene as it is in P2. The P2 gene order is also conserved in Vibrio phages K139 [38] and  and the Pasteurella phage F108 [39]. As in P2, the genomes can be divided into blocks of structural and regulatory genes. The structural genes are more similar in HP1 and HP2 than the regulatory genes.
The six genes coding for capsid proteins are arranged in the same order in HP1 phages and many P2 phages. The other structural genes, coding mainly for tail components, show generally no similarity to those of P2 phages. Only some of the regulatory genes are similar in both HP1 and P2 phages, e.g., int, CI, and repA. Regulatory genes in general are more conserved within the HP1 group.
Aeromonas phage O18P [40] is included into the HP1 phages. It contains slightly more genes related to HP1 than to P2, although, when looking at individual proteins, it sometimes appears to have an intermediate position. Its Rep protein is very similar to the DNA replication protein of Salmonella phage PSP3 and the A protein of phages K139, F108, W, and P2 homologs. The O18P major capsid protein is similar to the capsid proteins of phages K139, CTX, 186, and the Burkholderia phages.

III. The Spounavirinae
This proposed subfamily contains the ICTV-recognized genus "SPO1-like viruses" and, on the basis of our results, a proposed new genus (the "Twort-like viruses") and two peripherally related viruses, Lactobacillus plantarum phage LP65 [41] and Enterococcus faecalis phage EF24C [42,43]. All of these are virulent, broad-host range phages which infect members of the Firmicutes. They possess isometric heads of 87-94 nm in diameter and conspicuous capsomers, striated 140-219 nm long tails, a double base plate, and globular structures at the tail tip. The latter have been resolved as base plate spikes and short kinked tail fibers with six-fold symmetry [44]. Members of this group usually possess large (127-142 kb) nonpermuted genomes with 3.1-20 kb terminal redundancies [45,46]. The proposed name for this subfamily is derived from SPO plus una (latin for "one").
While the head diameter of Bacillus phage SPO1, of 87 nm [47], is consistent with membership in the group, its tail is significantly shorter than that of most members (140-150 nm) [3,48], and, the DNA contains 5-hydroxymethyluracil (HMU) rather than thymine. The outliers of this group comprise phages LP65 [41] and EF24C [42,43]. At 193 nm, the tail of phage LP65 is similar in length to that of other members of this group, but its genome is not terminally redundant [41]. Lastly, the genome size (142 kb), proteome and morphology of Enterococcus phage EF24C is clearly consistent with membership in this group (head diameter 93 nm; tail length 204 nm), but its genome is circularly permuted. Their close relationship was discussed in a recent paper [44].
Using a BLASTP raw threshold score of 100 and Core-Genes 3.0 http://binf.gmu.edu:8080/CoreGenes3.0/ to compare the proteomes of Twort, A511, LP65, and EF24C against SPO1, we identified two clusters of genes which are conserved. These corresponded to packaging and morphogenesis genes (SPO1 gp2.11 to gp16.2); and the cluster of replication genes, including helicase, exonuclease, primase, and resolvase (SPO1 gp19.5 -gp24.1). The DNA polymerases (SPO1 gp31 and homologs) of these phages are related more closely to bacterial-type I DNA polymerases than other phage deoxynucleotide polymerizing enzymes. The presence of host-related proteins in viruses has been observed by Dinsdale et al. [49] and elegantly explained by Serwer [50]. Metagenomic studies by the former group indicate the presence of numerous host-related proteins, including those related to motility and chemotaxis, in the virome fractions. While the functional significance of photosynthetic protein psbA in cyanophage genomes has been conclusively shown [51,52], the presence of host-related sequences should still be considered with healthy skepticism if the only data is the presence of homologs.

SPO1-like viruses
The current ICTV genus "SPO1 viruses" comprises some 10 Bacillus phages and Lactobacillus phage 222a; only the genome of SPO1 has been sequenced [53]. All SPO1-like Bacillus phage genomes that have been studied contain 5hydroxymethyluracil (HMU) instead of thymine and encode dUMP hydroxymethylase activity (SPO1 gp29). This phage also contains the unique 171-amino acid head decoration protein gp29.2. Whether this is unique to members of this genus will require the sequencing of additional genomes. Using cryo-electron microscopy, Duda and coworkers [54] confirmed the earlier observation [47] that the icosahedral head of SPO1 head has the triangulation number T = 16 rather than the more common T = 25. This feature is also shared with eukaryotic herpesviruses.

Twort-like viruses
The phages form a fairly homogeneous group of virulent phages infecting staphylococci (Twort, G1, K) [55] and Listeria (A511, P100) [56]. The group is named after phage "Twort," which may be a descendant of the original bacteriophage described by F.W. Twort in 1915 [57]. Head genes are located in the first cluster and tail genes are located in the third cluster. The virion major capsid and decoration proteins, Bcep781 gp12 and gp13, were identified by protein sequencing and show some similarity to head proteins from the "PB1-like viruses" group. Several tail morphogenesis proteins, corresponding to Bcep781 gp29 through gp52, can be linked to P2 tail genes by PSI-BLAST. In contrast to structural genes, genes for DNA replication and lysis are scattered throughout the genome. The lysis genes of these phages are not organized into a cassette but instead overlapping Rz and Rz1 genes are separated from the endolysin and holin genes [70]. A distinctive feature of these phages is the presence of highly, maybe completely, circularly permuted genomes. The terminases of these phages are strongly related to other pac-type phages that also have highly permuted genomes [71].

BcepMu-like viruses
This group was named "BcepMu-like viruses" because, like Mu and unlike most other phages, its members utilize transposition for replication. The distinctive genomic feature implicating the use of replicative transposition is the presence of random host DNA sequences at either end of the packaged virion DNA [58]. These host sequences are derived from excision of prophage DNA from random sites scattered over the host genome. This requires fundamental differences in terminase function as compared to more typical terminases that utilize concatemers of phage genomic DNA as a substrate. This is reflected by the homology between BcepMu TerL and Mu TerL. Another genome feature shared by BcepMu and Mu is the presence of genomic terminal CA dinucleotide repeats, a feature common in many transposons. Furthermore, BcepMu and Mu seem to be morphologically identical.
Despite these similarities, BcepMu and its close relative E255 have marked differences in genome organization and minimal overall protein sequence similarity to Mu, explaining why they have not been grouped together.

Felix O1-like viruses
Salmonella phage Felix O1 has a relatively large head (70 nm in diameter) and a tail of 138 × 18 nm characterized by subunits overlapping each other like roof tiles and showing a criss-cross pattern like phages PB-1 and F8. Notably, it exhibits small collars and eight straight tail fibers. Upon contraction, the base plate separates from the sheath. The type virus Felix O1 is widely known as a diagnostic Salmonella-specific phage [21]. Until recently, the genomic sequence (86.1 kb) of phage Felix O1 was unique and was considered, as such, a "genomic orphan", but two related genomes have been recently characterized, though their sequences have yet to be deposited to the public databases. They are coliphage wV8 and Erwinia amylovora phage Ea21-4 (DNA sizes 88.5 and 84.6 kb, respectively [73,74].

HAP1-like viruses
This genus contains two marine phages, Vibrio parahaemolyticus phage VP882 (NC_009016) and Halomonas aqua-marina phage HAP-1 [75]. Both are temperate viruses possessing 38-43 kb genomes which lack integrase genes. While our proteomic analysis and the literature suggests that Vibrio harveyi phage VHML [76,77] should be included in this genus, there is no evidence that this phage can be propagated: it is only produced after induction, does not plaque, and must be considered a defective prophage. The data presented by Mobberley et al. [78] show that HAP-1 exists as a linear prophage in lysogens and possesses a protelomerase (ORF34, YP_001686770.1) and a partitioning protein (ParA homolog, ORF33, YP_001686769.1) which are homologous to proteins encoded by VHML and VP882. While these viruses share some homology with the coliphage P2, this is largely restricted to the genes associated with tail morphogenesis V (gpV, W, J, I, H, G) and F operons (gpFI, FII, E, T, U, D). Based upon their radically different life cycle from the other P2 phages, we have chosen not to include them in the Peduovirinae.

Bzx1-like or I3-like viruses
Myoviruses are exquisitely rare in the Actinobacteria (only an estimated 1% of all attempts to isolate phages from cultures was successful [79]). Phages I3, Bzx1 and Catera are characterized by heads of 80 nm in diameter and unusually short tails of 80 nm in length with a cup-shaped base plate. They do not resemble any other mycobacteriophages nor any other myovirus. We propose that this genus contains the following eight Mycobacterium smegmatis bacteriophages: I3, Bxz1, Cali, Catera, Myrna, Rizal, ScottMcG and Spud. Phage I3, which has been the first to be described, is the type virus of the newly proposed myovirus genus although it has not yet been fully sequenced. Within this assemblage, we identified a distinct subtype which show >90% protein similarity to Bxz1 (Cali, Catera, Rizal, ScottMcG and Spud) and genomes of 154-156 kb [80,81]. Mycobacteriophage Myrna, with a genome of 164 kb, shares approximately 45% of proteins with the Bxz1 subgroup phages. Interesting features include the presence of adenylosuccinate synthase homologs among the Bxz1 subgroup (gp250) and its absence in the genome of Myrna. The latter possesses several proteins not present in the Bxz1 group, including the large hypothetical proteins gp187 (YP_002225066.1) and gp243 (YP_002225120.1), a putative nicotinate phosphoribosyltransferase (gp263, YP_002225140.1) and ATP-dependent protease (gp262, YP_002225139.1).

phiCD119-like viruses
These are all integrative temperate phages of Clostridium difficile with genomes ranging from 51-60 kb in size and a mol%G+C of 28.7-29.4 [82][83][84]. The genus is named after its first fully sequenced member. In each case, the electron micrographs are of poor quality [84,85] or the measurements are very variable with large standard deviations [85]. Virus head diameters are given as 50-65 nm and tail lengths are said to range from 110 to 210 nm [82][83][84]. In certain cases, their annotation is also questionable, The multiple repressor/antirepressors annotated in the genomes of CD27 and C2 do not appear to contain helix-turn-helix or other DNA binding motifs [86]; nor the presence, in the latter phage, of ParA/ParB homologs. What unites these viruses, in addition to similar proteomes, is the presence in each of a cytosine-C5 specific DNA methylase (pfam00145, DNA_methylase, C-5 cytosine-specific DNA methylase; CD119 protein YP_529611.1) and a DNA replication cassette composed of three proteins: a DnaD (primosome recruiting protein, presumably analogous to lambda gpO and P22 gp18; CD119 protein YP_529603.1), a hypothetical protein (misidentified in CD27 as a putative resolvase/integrase and missed entirely in the annotation of CD119) and a single-stranded DNA binding protein.

phiKZ-like viruses
Phages KZ and EL are members of a group of giant phages isolated, to date, only in Pseudomonas species. Their heads are isometric, 120 nm in diameter, and they possess 190 nm-long tails. The phage heads contain an inner body. The DNA of KZ is over 280 kb in size and has 306 ORFs, most of which are unrelated to ORFS of any known protein [87], while EL contains 201 ORFs within its 211 kb genome [88]. These two phages and Pseudomonas phage Lin68 have recently been proposed as part of a genus "phiKZ viruses" [89]. We now consider that the differences (number of ORFs, mol%G+C, protein homologs) between KZ and EL exclude EL from membership in the same genus. Indeed, the recent analysis of novel Pseudomonas phage 2012-1 [90] showed this phage to have a strong correlation to KZ (167 similar proteins), suggesting that it is a true member of the phiKZ virus genus.
Phage F8 is one of the Pseudomonas typing phages from the Lindberg set which includes six more similar phages [93,94]. It possesses a 70-nm wide head with visible capsomers and a 138 nm-long tail, four short straight tail fibers and a base plate that separates from the sheath upon contraction. The tail exhibits no transverse striations, but presents a criss-cross pattern [95]. This criss-cross pattern is a rare feature that has only been observed in phage Felix O1.
BcepF1 was isolated from soil by enrichment culture [96] using a Burkholderia ambifaria strain as its host (E.J. Summer and C.F. Gonzalez, unpublished). The BcepF1 genome is 72 kb in size and encodes 127 proteins while the genome of F8 is 66 kb and encodes 91 proteins [97]. Both genomes are organized into four alternating, unequal gene clusters on the top and bottom strands. The phages share 43 recognizable homologous proteins. The shared proteins specify virion morphogenesis, DNA metabolism and packaging and include a number of hypothetical proteins of unknown function.
A striking feature of both F8 and BcepF1 is the large number of small genes, all encoding hypothetical proteins and clustered together. In BcepF1, the first 20 kb of the genome, encoding 62 proteins, is devoted almost exclusively to these. In F8, there are two clusters of 8 kb (encoding gp1 through gp16, except gp4, TerL) and 4 kb (encoding proteins gp77 through gp91) of primarily small hypothetical novel genes. These heterogeneous regions are largely responsible for the difference in genome size and protein content between the two phages. It has generally been assumed that these small proteins are involved in host take-over (E. Kutter, personal communications) which appears to be substantiated by the results of Liu and coworkers [98].
Phages F8 and BcepF1 have some similarity to myophage BcepB1A, which is itself related in a mosaic fashion to the Bcep781 group of phages [68]; however, these similarities are essentially limited to morphogenetic proteins. As in the Bcep781 phages, several putative tail assembly proteins of F8 and BcepF1 can be linked to those of P2 by PSI-BLAST.

General summary
The comparison of proteomes by CoreGenes/CoreExtractor BLASTP programs appears to be a decisive progress in classifying tailed bacteriophages, i.e., our results corrobo-rate the existing ICTV classification of the Myoviridae and are generally well compatible with other informaticsbased studies (Table 4), like the reticulate clustering based on gene families [99] (Lima-Mendez, personal communication). Our studies also refine certain relationships and suggest new ones. Specifically, we propose three new subfamilies (Peduovirinae, Teequatrovirinae, Spounavirinae) and eight new genera (Bcep781, BcepMu, Bzx1, Felix, HAP1, PB1, phiCD119 and phiKZ-like viruses). The individualization of genera containing two or three members as well as of genomic orphans, e.g. coliphage P1 without apparent homologs, is taxonomically as valuable and important as the confirmation of the large T4 and P2 groups and in total agreement with previous informaticsbased classifications (Table 4). Our studies once again prove the utility of the dual CoreGenes/CoreExtractor approach to defining relationships between large numbers of virus genomes. These relationships carry evolutionary relevance, since our proteomic analyses, combined with the phylogenetic studies [100], suggest that the Myoviridae are mainly influenced by vertical evolution rather than by horizontal gene transfer. As observed in the Cluster dendrogram, the clusters are populated unevenly -several include only one phage while two, the largest, include dozens phages. This reflects the fact that past phage research has focused on coliphages, and suggests that we should broaden our research to include phages from a broader range of bacteria.
Among the 102 analyzed Myoviridae, phage Mu displayed the most significant evidence of horizontal gene exchange. This virus is related to three members of pilusspecific Siphoviridae infecting Pseudomonas aeruginosa (DMS3, D3112, B3 [59,60,101]), sharing 20 to 40% of its genes with each of them. These phages can be viewed as true hybrids, produced by recombination of different ancestors and, like the couple lambda/P22 (to be described in a future paper), cross family boundaries based on tail morphology. Nonetheless, the majority of Myoviridae, when forced to cluster, do so in a logical manner: upgrading of the ICTV genus "P2 phages" to the Pduovirinae with two genera ("P2 viruses" and "HP1 viruses") is a straightforward proposal and the same is true for the Spounavirinae (SPO1 viruses and Twort viruses).
Relationships among T4-like phages are more complicated. We reject the postulated inclusion of the cyanophages since their overall similarity to T4 is too low for consideration, at least according to our criteria. Comeau and Krisch [29] have recently recognized three groups of T4-related phages. The "Near T4" group containing the Tevens, Pseudo T-evens, and Schizo T-evens; the "Far T4" clade including Exo-T4 phage RM378; and, the "Cyano T4" assemblage. We believe that the latter are sufficiently different from the other T4 viruses to be excluded from the Teequatrovirinae at this time. This implies that this subfamily currently contains two distinct genera: T4 and KVP40 viruses. Within our restricted "T4 phages" genus, four subtypes were identified (T4-type, 44RR2.8t-type, RB43-type and the RB49-type viruses). This is confirmed by the phylogenetic studies of Filée et al. [5] and our unpublished results. Since these subtypes include different species, no equivalent taxonomic level is currently available in the official ICTV classification. Perhaps the introduction of a "subgenus" level should be considered in order to account for the complexity of T4-related phages. Alternately, a general elevation of all taxonomic levels (from the subfamily level) may be envisioned. This study illustrates the great diversity and biological richness of tailed phages. The number of independent genera is not surprising in view of the antiquity of tailed bacteriophages, which are found in archaea and bacteria and may predate the separation of these domains. It can be expected that many more phage groups will be found or individualized in the future. For example, this study does not include giant Bacillus phage G, the largest bacterial virus with a genome of 497,513 bp and 684 genes [102] whose sequence is not yet available for comparison.
We reiterate our statement in our publication on the taxonomy of the Podoviridae, "We highly recommend that the entire genome of any newly sequenced phage be thoroughly screened (BLASTX) against the Entrez Query "Viruses [ORGN]" databases to reveal all similarities for quick identification of potential relationships. A validation step using CoreGenes is essential and more precise for individual comparisons [2]."

Conclusion
Myoviridae can be classified by their proteomes into subfamilies and genera. This classification is in close agreement with ICTV -and other informatics-based classifications.

Phages and bioinformatic tools
This study is limited to the genomes of completely sequenced, viable Myoviridae from the databases of NCBI http://www.ncbi.nlm.nih.gov/ and the Tulane University at New Orleans, LA (GT4P, "Genomes of the T4 Phages"; http://phage.bioc.tulane.edu/, excluding prophages without a virion stage. We follow here the ICTV which classifies viable viruses only. Prophages and proviruses, prophage fragments, defective viruses, phage-like "bacteriocins", virus-like or phage-likes particles from sections or the environment, viroids, satellite viruses, plasmids, or transposons, or artificial virus hybrids are not considered. CoreExtractor and CoreGenes software were used as described previously [2]. In the case of CoreExtractor, the BLASTX analysis of phage gene products was performed using the NCBI Batch BLAST server, http://green gene.uml.edu/programs/NCBI_Blast.html hosted by the University of Massachusetts at Lowell, MA. Searches were performed against the NCBI nonredundant database (BLOSUM45 matrix, with a 0.05 expectancy cut-off value) (Additional Figure 2). Several versions of CoreGenes are available, with each upgrade incorporating previous functions, at http://www.binf.gmu.edu/genometools.html. In particular, for the current study, a version, CoreGenes3.0beta, was developed specifically for tallying the total number of genes contained in the genomes. It also displays a percent value of genes in common with a specific genome. Additionally, this version finds unique genes between two genomes. The BLASTP stringency setting was set at its default value (75). Proteins containing at least 132 amino acid residues were subjected to BLASTP analysis at NCBI or Tulane University.

Hierarchical cluster dendrogram
Cluster analysis was used to visualize the structure of the proteomic data. We constructed a dissimilarity matrix from the CoreExtractor matrix. The dissimilarity between two phage genomes was taken as one (1) minus the average of the two reciprocal correlation scores in the CoreExtractor matrix ( Figure S1B). Subsequently, single linkage hierarchical clustering was performed using "The R Project for Statistical Computing" software http://www.rproject.org/.