- Research article
- Open Access
The prevalence of gene duplications and their ancient origin in Rhodobacter sphaeroides 2.4.1
© Bavishi et al; licensee BioMed Central Ltd. 2010
- Received: 18 May 2010
- Accepted: 30 December 2010
- Published: 30 December 2010
Rhodobacter sphaeroides 2.4.1 is a metabolically versatile organism that belongs to α-3 subdivision of Proteobacteria. The present study was to identify the extent, history, and role of gene duplications in R. sphaeroides 2.4.1, an organism that possesses two chromosomes.
A protein similarity search (BLASTP) identified 1247 orfs (~29.4% of the total protein coding orfs) that are present in 2 or more copies, 37.5% (234 gene-pairs) of which exist in duplicate copies. The distribution of the duplicate gene-pairs in all Clusters of Orthologous Groups (COGs) differed significantly when compared to the COG distribution across the whole genome. Location plots revealed clusters of gene duplications that possessed the same COG classification. Phylogenetic analyses were performed to determine a tree topology predicting either a Type-A or Type-B phylogenetic relationship. A Type-A phylogenetic relationship shows that a copy of the protein-pair matches more with an ortholog from a species closely related to R. sphaeroides while a Type-B relationship predicts the highest match between both copies of the R. sphaeroides protein-pair. The results revealed that ~77% of the proteins exhibited a Type-A phylogenetic relationship demonstrating the ancient origin of these gene duplications. Additional analyses on three other strains of R. sphaeroides revealed varying levels of gene loss and retention in these strains. Also, analyses on common gene pairs among the four strains revealed that these genes experience similar functional constraints and undergo purifying selection.
Although the results suggest that the level of gene duplication in organisms with complex genome structuring (more than one chromosome) seems to be not markedly different from that in organisms with only a single chromosome, these duplications may have aided in genome reorganization in this group of eubacteria prior to the formation of R. sphaeroides as gene duplications involved in specialized functions might have contributed to complex genomic development.
- Gene Duplication
- Horizontal Gene Transfer
- Duplicate Gene
- Gene Duplication Event
- Amino Acid Divergence
Rhodobacter sphaeroides 2.4.1, a purple nonsulfur photosynthetic eubacterium, belongs to the α-3 subgroup of Proteobacteria[1, 2], members of which display an array of metabolic capabilities in the assembly and regulation of metabolic functions , electron transport [4–6], bioremediation , and tetrapyrrole biosynthesis [8, 9]. In addition, many members of this subgroup establish different types of eukaryotic associations [10–14]. The genome of R. sphaeroides 2.4.1 has been completely sequenced and annotated  and is comprised of two circular chromosomes and five plasmids.
Bacterial species continue to encounter different ecological niches, and their genome size increases by acquiring habitat relevant genes by horizontal gene transfer [16–18] and gene duplication [19, 20], which together play a major role in the evolution of both genome size and complexity. Duplicated genes are ubiquitously present among eukaryotes and prokaryotes [21–24]. Analyses on over 100 fully sequenced eubacterial and archaeal genomes have revealed a great extent of DNA sequence duplications , however it remains unclear whether the expansions of genome size and complexity were essential for adaptive phenotypic diversification.
The present study aimed to systemically identify the extent and history of gene duplication in the genome of R. sphaeroides. A hypothesis that the complex genome structure (large genome size and the presence of multiple chromosomes) requires an extensive amount of gene duplications was examined by determining the distribution of duplicated genes on both chromosomes and plasmids and comparing the determined levels of R. sphaeroides gene duplication to that in other bacterial species that possess a single chromosome. After determining the extent of these gene duplications, two additional hypotheses were devised. First, a hypothesis was formulated to test whether gene duplications were selectively preserved in specific Clusters of Orthologous Groups (COGs) necessary to accommodate the diverse growth mode of this organism. Second, a hypothesis was tested to ascertain whether this level of large-scale gene duplications occurred after the diversification of members of the α-3 subgroup of Proteobacteria. The role of gene duplications in understanding the evolution of new metabolic functions is discussed along with the age and functional constraints of these gene pairs across four strains of R. sphaeroides. Thus, this study investigates the nature of gene duplications in an organism with complex genome structuring in order to determine the role of such duplications in the evolution of new metabolic functions and complex genome development.
Protein homology and duplication search analysis
A protein homology search was performed using the gapped BLASTP , which included gap penalties, and was therefore more conservative in database searches. The search was conducted in two steps. First, each protein sequence of the R. sphaeroides genome was used to search the homologous proteins against their own database. Then, each of the corresponding homologous protein sequences identified by the first step was reciprocally paired, based on a threshold E-value of ≤ 10-20. The cut-off value for the percent amino acid identity was set at ≥ 30%, which defines the level above which gene duplication can be reliably identified in many bacterial species [15, 27, 28]. However, certain duplicated genes in R. sphaeroides that did not meet the specified search criteria (i.e. possessed less than 30% identity) have been identified or reported in the past [15, 28]. These identified or reported duplications were incorporated for subsequent analysis. Also, to approximately determine the prevalence and arrangement of selected gene duplications in three other completely sequenced R. sphaeroides strains (ATCC 17025, ATCC 17029, KD131), each gene (those designated as "Orf 1") in a duplicated pair in R. sphaeroides 2.4.1 was subjected to BLASTP analysis against the three R. sphaeroides strains, with the same cutoff criteria utilized as before.
Analysis of the Cluster of Orthologous Groups (COGs)
Gene homologs are families of genes, which encode similar protein functions within a genome and between genomes; if such genes are derived from different species, they are called orthologs, and if they are derived from the same species, they are referred to as paralogs . The Cluster of Orthologous Groups [30, 31] classifications provide a tool in examining gene roles. There are four major COG functions, which include 1: Information storage and Processing, 2: Cellular Processes, 3: Metabolism, 4: Poorly Characterized functions. These major groupings were further classified into 25 sub-groups. However, a number of Orfs have been classified into more than one COG as they encode overlapping gene functions, while other Orfs have poorly characterized functions. The percentage of each COG functions, both in the general groups and the sub-groups, among the duplicated genes was compared with the percentage of the respective COG functions over all genes present in the complete genome. A chi-square (χ2) test was performed for both distribution comparisons with a null hypothesis assuming that the gene duplications have the same COG distributions as all the genes in the full genome. In addition, all 234 pairs were subsequently mapped onto CI and CII. The level of divergence was indicated by the y-axis and the height of the gene pinning and each gene's major COG group classification was color-coded.
To determine the origin and history of the gene duplications in R. sphaeroides, initially each protein in the protein-pairs was blasted against the microbial database at NCBI using the BLASTP . Geneious v4.6, a versatile bioinformatics suite, was used to organize and perform the protein similarity searches, generate alignments, and construct phylogenetic trees . Only organisms with completely sequenced genomes were chosen to avoid poor or incomplete sequence data from shotgun or partial genome sequencing projects.
For each set of homologous matches, there were four proteins: the duplicated genes and an ortholog match for each copy as only the best and most complete hits to each gene in a pair were selected. For these duplicate pairs, two alternative phylogenetic relationships were predicted. The Type-A relationship was predicted when a protein sequence branched with a homolog (ortholog) from a closely related species rather than its counterpart protein (paralog) within the R. sphaeroides genome, whereas as Type-B relationship was predicted when the duplicate protein copies within R. sphaeroides branched with each other [28, 33]. Additionally, four example phylogenetic analyses, two exhibiting Type-A phylogeny and two exhibiting Type-B phylogeny, were carried out with gene duplications common among the four R. sphaeroides strains.
Protein sequence alignments were carried out using MUSCLE , a program known for its accuracy and speed. Phylogenetic analysis was performed using PhyML  with the WAG model  to generate unrooted, maximum likelihood trees. Bootstrap values were calculated using 100 replications for the trees where topology was being determined. Maximum likelihood trees were constructed for all protein-pairs to ascertain the tree topology (Type-A or Type B). If a set of duplicated genes had their highest match to the same ortholog, then the next highest ortholog match, if available, for one of the genes was utilized in the tree construction to ascertain accurately the duplication topology.
Functional Constraints Analysis
For the functional constraints analysis, comparisons were conducted within all four R. sphaeroides strains. More specifically, the 28 common gene pairs among the four strains were utilized for the functional constraints analysis where the genes in a given pair were compared against one another. The synonymous and nonsynonymous substitution rates along with the nonsynonymous-synonymous substitution rate ratio were calculated using the modified Yang-Nielsen algorithm [37, 38]. MUSCLE was used to align amino acid sequences . These aligned sequences were then transformed into the original DNA sequences after which, the KaKs_Calculator was used with each pair of DNA sequences  to calculate the synonymous substitution rate (Ks), the nonsynonymous substitution rate (Ka), and the nonsynonymous/synonymous rate ratio (ω = Ka/Ks). Under the MYN model, ω = 0.3, 1, and 3 were used for negative (purifying), neutral, and positive selection, respectively [37, 38]. A one-way ANOVA was used to test whether the distributions of ω among the four strains were dissimilar.
Horizontal Gene Transfer
Horizontal gene transfer (HGT) features were estimated using Alien-Hunter, which predicts HGT events using interpolated variable order motifs . This method exploits compositional biases to determine potential HGT areas where abnormal (HGT) areas are identified as those that are higher than a threshold value, a value that is calculated using the sequence structure of the input genome among other factors. This software was used to determine the areas of possible HGT and the levels of HGT on CI and CII independently. The genes present within these regions were additionally identified. Artemis  was used to view the Alien-Hunter output.
Extent of gene duplications in R. sphaeroides
Of the total 4242 protein coding genes in its genome, a total of 1247 genes (29.4% of its genome) exist in multiple copies in the R. sphaeroides genome. Gene homologs are present in different copies reflecting the diversity of gene multiplication. Numbers of genes with 2, 3, 4 and 5 and more (≥ 5) copies were 468, 183, 152, and 444, respectively. Approximately 73% of the total gene homologs represent two classes, genes with two copies (37.5%; 234 protein pairs) and genes with ≥ 5 copies (35.6%). Genes with ≥ 5 copies represent various types of functions, for example, ABC type transporters, families of transcriptional factors, and cell-signaling response regulators (data not shown). If genes that are present in more than two copies were to be selected, determining the lineage of such genes becomes functionally more complex, especially as many such genes are also present within multiple gene families. Moreover, the genes in these families can be analogous instead of homologous, meaning that they are similar due to function rather than origin. As such, further analysis was carried out only on genes which were identified as duplicate protein pairs as listed in Additional file 1.
Gene duplication and diverse COGs functions
Origin of gene duplications and relationship among R. sphaeroides strains
The strength of the tree topology was analyzed using bootstrap values, information concerning which is also shown in Additional file 2. Bootstrap values for 8 trees could not be determined due to the lack of one or more orthologs. Bootstrap values not only signify the significance of a tree topology (Type-A and Type-B), but also provide an insight into the relative origin of a given gene duplication. Gene duplication events that occurred significantly before organism speciation would display Type-A relationships with high bootstrap values. Gene duplication events that occurred significantly after organism speciation would display Type-B relationships with similarly high bootstrap values. Of the 226 trees for which bootstrap values were obtained, 209 (92.5%) had bootstrap values ≥ 95. The bootstrap values remained significant within both Type-A and Type-B phylogenetic trees. Of the 180 Type-A trees, 172 (95.56%) exhibited ≥ 95 bootstrap values while of the 46 Type-B trees, 37 (80.43%) exhibited ≥ 95 bootstrap values. Thus, the majority of these trees demonstrated correct and significant trees topologies, which support the relative timings of the origins of these gene duplications.
Distribution of Tree Types and Bootstrap Values in R. sphaeroides
v ≥ 90
70 ≤ v < 90
v < 70
Horizontal Gene Transfer
For R. sphaeroides 2.4.1, the putative HGT regions were found both in CI and CII. The non-optimized coordinates for these regions are not shown. The CI HGT regions sum to 65,005 nucleotides, which spans over 60 genes and which comprises 2.04% of the total CI replicon. The CII HGT regions sum to 110,009 nucleotides, containing 99 genes, and comprises 11.66% of the total CII replicon. Of the 60 HT genes in CI, 5 are among the duplicate gene pairs, while of the 99 HT genes in CII, 8 are among the duplicate gene pairs. The distribution of HGT regions on both chromosomes revealed that most of the duplicated genes are outside of these HGT regions.
Extent of gene duplication and horizontal gene transfer in R. sphaeroides
A systematic genome analysis of the R. sphaeroides, which possess multiple chromosomes, has shown approximately the same level of gene duplication (~28%) as reported in many other bacterial genomes that possess only one chromosome [22, 42–44] and eukaryotes [22, 45–47]. Thus, similar levels of gene duplication in the genomes of eubacteria, archeae, and eukarya suggest that genome size or genome complexity and the levels of gene duplication present in their genomes are not correlated. Gene duplication can occur on two different scales: large-scale duplication (whole-genome duplication, WGD) and smaller-scale duplications, which consists of tandem duplication of short DNA sequence within a gene, duplication of the entire gene or duplication of large genomic segments [48–50].
The majority of gene duplications in R. sphaeroides exist in the form of small DNA segments (one or few genes), but a few duplications span over a large segment of genomic segments. For example, chemotaxis-related genes are located at four major loci, chemotaxis operon I (RSP2432-RSP2444), chemotaxis operon II (RSP1582-RSP1589), chemotaxis operon III (RSP0042-RSP0049), and chemotaxis operon IV is a part of a 56 kb- flagella biosynthesis gene cluster (RSP0032-RSP0088). Three copies are present on CI and one copy is present on CII. Although bacteria have acquired a reasonable proportion of their genetic diversity through horizontal gene transfer (HGT) from related microorganisms , its percentage varies from 1.5% to 14.47% . The results for R. sphaeroides HGT fell within these ranges but the amount of HGT in CII was significantly higher proportionally (11.66%) compared to that in CI (2.04%). Such distinct levels of HGT for CI and CII may suggest that both chromosomes play different roles in R. sphaeroides. This observation further confirms that CII has been more flexible in acquiring genes from other species . However, it must be noted that this method of analyzing HGT may not pick up genes that are horizontally transferred between species of similar composition. In addition, although the role of duplicated genes in the majority of bacterial species still remains unclear, the role of gene duplication in the resident genome cannot be underestimated, especially since the majority of these gene duplications are not located within putative HGT regions as seen in R. sphaeroides.
Protein divergence and the evolution of different COG functions in R. sphaeroides
Gene duplications in R. sphaeroides involved in a wide variety of metabolic functions, and these duplications revealed a considerable variation in amino acid divergence within each metabolic function category. For example, protein pairs involved in flagellar assembly and energy production diverged 60-70%, while protein-pairs involved in photosynthesis and carbon metabolism diverged only 10-30%. These conserved gene homologs may either protect against deleterious changes in either copy and consequently result in functional redundancy or may not have been cleared out simply because they are not harmful to the organism. Two sets of flagellar operons and neu operons were located on CI, and most homologous protein pairs had diverged approximately 60-70% of their amino acid sequences. One complete set of flagellar genes (RSP0032-RSP0084) is functional as these genes were expressed in all growth conditions, while the microarray expression of the incomplete flagellar operon (RSP1302-RSP1330) was not detected , and therefore the second set of flagellar genes could be required for surface translocation during biofilm production or in an alternative lifestyle that has not been identified yet as seen in other organisms [53, 54]. Besides the genes for known functions, the genome of R. sphaeroides contains about 40 duplicate genes encoding hypothetical proteins. About one-half of the total hypothetical protein-pairs diverged ~10-20%, and the other half of the hypothetical protein-pairs diverged ~50-70%. The analyses further revealed that genes involved in groups L (DNA synthesis), N (Cell motility and secretion), U (Intracellular transport), C (Energy production), G (Carbohydrate metabolism), and H (Coenzyme metabolism) were overrepresented among genes evolved by gene duplication, while the number of genes representing other COGs remained low or fairly equal percentage-wise to the number of genes representing those COGs in the overall genome of R. sphaeroides. Therefore, genes involving transport and metabolism were selected for by gene duplication. In addition, the distribution of the gene duplications (Figure 4) revealed that clusters of gene duplications of the same COG function exist on both CI and CII and that most of the gene duplications in a cluster possessed roughly similar levels of sequence conservation. As such, it may be possible that these highlighted chromosomal segments are locally selected for, especially as these gene duplications possess similar functions.
The sequence similarity and evolutionary constraints of the duplicate gene-pair are indicative of the essential or nonessential nature of gene function. Previous studies have revealed shown that the type II topoisomerases gyrase and topoisomerase IV demonstrated 40 to 60% amino acid sequence identity, but each protein has a distinct function essential for cell survival [55, 56] highlighting the limitations in bioinformatics approaches. In a similar note, duplicate protein pairs with very little amino acid identity can share similar functions. In Bacillus subtilis, the peptide defomylases (Def and YkrB) show similarity only across short sequences (motifs) but both independently carry a deformylase reaction essential for cell viability . Therefore, gene disruption analysis is further required to determine the definitive function of isologous gene-pairs.
In the specific analysis involving the carbon metabolism genes, it is likely that the cluster in CI containing cbbA, cbbF, cbbM, cbbP duplicated first and then cbbG and cbbT duplications arose from CI and were inserted between the duplicated cbbA and cbbP genes on CII. In addition, the two genes that code for hypothetical proteins found between cbbT and cbbG on CI may have arisen through an additional insertion or transposition event. Although these duplicated genes exhibit varying levels of protein divergence, these protein-pairs are under negative selection as evidenced by the functional constraints analysis in Figure 10. Additionally, the identity between the cbbM genes was low (31%). This is most probably due to the high degree of difference between cbbM I and cbbM II . More specifically, it has been shown that cbbM, which performs the first critical step in carbon fixation, has two forms (cbbM I and cbbM II ). The form I enzymes possesses large and small subunits while the form II enzyme possesses only large subunits that are different from the form I large subunits . The distinguishing between CO2/O2 is primarily accomplished by loop 6 of the large subunit, which contains a conserved element of 11 amino acid residues. Form II enzymes are primarily anaerobic and unable to function in aerobic environments whereas form I enzymes can function in aerobic environments [59, 60]. As form II enzymes are not widely distributed among different species, it is most likely that form I enzymes duplicated to make form II enzymes in certain species and then diverged from its original function to operate in aerobic environments . In contrast, the cbbA genes may actually encode for two different enzymes (cbbA I and cbbA II ), although there is high identity between the two genes (79%). cbbA II genes are usually confined to simple organisms such as bacteria and fungi while cbbA I is present only some bacteria such as R. sphaeroides, but is mostly confined to higher level organisms, including plants and animals. It could be that these two cbbA genes in R. sphaeroides are therefore different although they share high homology as these two enzymes are thought to have evolved from convergent evolution [62, 63]. However, in many instances, there is not markedly homology between cbbA I and cbbA II . Therefore, the physiological significance of these duplications, including those involving cbbA and cbbM, need to be further studied biochemically and molecularly to better understand their relationships.
Ancient gene duplications predated the existence of two chromosomes in R. sphaeroides
Since the overwhelming majority of gene duplication in the current day R. sphaeroides genome are orthologs and originated prior to or at the time of lineage formation, these findings also validate previous results that a large-scale gene duplication event might have occurred prior to the speciation of R. sphaeroides. and possibly even before the diversification of the α-3 Proteobacteria. The HGT analysis conducted suggests that the contribution of laterally transferred genes to the duplicated genes is not very significant. It must also be noted that with the sequencing of new organisms and strains, it is possible that new ortholog matches to these gene duplications could be found. However, even so, such new sequences could only change Type-B trees to Type-A trees. Such an understanding aids the mentioned finding that an overwhelming majority of the gene duplications are Type-A. Another issue that must be noted is that it is possible that genes in relatively recent duplications in separate R. sphaeroides strains could have evolved to look more like functional homologs in other species. However, 61.54% of the 234 R. sphaeroides 2.4.1 gene pairs were found in at least one other R. sphaeroides strain. Moreover, the functional constraints data among the 28 common gene pairs shows that these pairs are under negative selection and are therefore strongly conserved in function. It is likely then that the majority of gene duplications in R. sphaeroides are undergoing negative selection as well.
In addition, the identification of homologous gene pairs among the other three strains of R. sphaeroides reveals that although a gene duplication event may have occurred prior to the formation of R. sphaeroides lineage, significant gene loss or retention has occurred among all R. sphaeroides strains. The distribution of matches on R. sphaeroides ATCC 17029 also suggests a greater amount of gene loss or divergence compared to that of the other strains and so this strain may have originated earlier from the lineage compared to the others as it has had more time to undergo selection and deletion processes. However, the genome of R. sphaeroides ATCC 17029 revealed high nucleotide identity (~95%) with R. sphaeroides 2.4.1 in regions of common homology , so rather it may be that several duplicate gene pairs have diverged to a level where no protein sequence similarity can be detected.
Since many gene homologues of R. sphaeroides share high genetic identity with homologues (orthologs) from a diverse group of α-Proteobacteria species, a massive gene duplication event may have had occurred before the diversification of species in α-Proteobacteria. The overwhelming presence of Type-A gene duplications on CI and CII unambiguously demonstrates that both chromosomes (CI and CII) were present at the time of species formation, and therefore these two chromosomes have been essential partners within the R. sphaeroides genome since its formation.
The analyses reveal the abundance of gene duplications in R. sphaeroides 2.4.1 performing a wide range of functions. Moreover, although majority of gene duplications have originated prior to speciation of the R. sphaeroides lineage, there are varying amounts of gene loss or conservation among the four R. sphaeroides strains. The functional constraints analysis shows that all of the common duplications among the four R. sphaeroides strains are under purifying selection suggesting the conservation of the functions of these gene pairs. Finally, the results suggest that the level of gene duplication in organisms with complex genome structuring (more than one chromosome) is not markedly different from that in organisms with only a single chromosome.
We thank the Research and Special Programs Department of Sam Houston State University for the funding of this work through the award of an Enhancement Grant for Research (EGR) to Madhusudan Choudhary.
- Woese CR: Bacterial evolution. Microbiol Rev. 1987, 51 (2): 221-271.PubMed CentralPubMedGoogle Scholar
- Woese CR, Stackebrandt E, Weisburg WG, Paster BJ, Madigan MT, Fowler VJ, Hahn CM, Blanz P, Gupta R, Nealson KH: The phylogeny of purple bacteria: the alpha subdivision. Syst Appl Microbiol. 1984, 5: 315-326.View ArticlePubMedGoogle Scholar
- Zeilstra-Ryalls J, Gomelsky M, Eraso JM, Yeliseev A, O'Gara J, Kaplan S: Control of photosystem formation in Rhodobacter sphaeroides. J Bacteriol. 1998, 180 (11): 2801-2809.PubMed CentralPubMedGoogle Scholar
- Jenney FE, Daldal F: A novel membrane-associated c-type cytochrome, cyt cy, can mediate the photosynthetic growth of Rhodobacter capsulatus and Rhodobacter sphaeroides. EMBO J. 1993, 12 (4): 1283-1292.PubMed CentralPubMedGoogle Scholar
- Grishanin RN, Gauden DE, Armitage JP: Photoresponses in Rhodobacter sphaeroides: role of photosynthetic electron transport. J Bacteriol. 1997, 179 (1): 24-30.PubMed CentralPubMedGoogle Scholar
- Brandner JP, McEwan AG, Kaplan S, Donohue TJ: Expression of the Rhodobacter sphaeroides cytochrome c2 structural gene. J Bacteriol. 1989, 171 (1): 360-368.PubMed CentralPubMedGoogle Scholar
- Moore MD, Kaplan S: Identification of intrinsic high-level resistance to rare-earth oxides and oxyanions in members of the class Proteobacteria: characterization of tellurite, selenite, and rhodium sesquioxide reduction in Rhodobacter sphaeroides. J Bacteriol. 1992, 174 (5): 1505-1514.PubMed CentralPubMedGoogle Scholar
- Neidle EL, Kaplan S: Expression of the Rhodobacter sphaeroides hemA and hemT genes, encoding two 5-aminolevulinic acid synthase isozymes. J Bacteriol. 1993, 175 (8): 2292-2303.PubMed CentralPubMedGoogle Scholar
- Zeilstra-Ryalls JH, Kaplan S: Control of hemA expression in Rhodobacter sphaeroides 2.4.1: regulation through alterations in the cellular redox state. J Bacteriol. 1996, 178 (4): 985-993.PubMed CentralPubMedGoogle Scholar
- Galibert F, Finan TM, Long SR, Puhler A, Abola P, Ampe F, Barloy-Hubler F, Barnett MJ, Becker A, Boistard P: The composite genome of the legume symbiont Sinorhizobium meliloti. Science. 2001, 293 (5530): 668-672. 10.1126/science.1060966.View ArticlePubMedGoogle Scholar
- Lerouge P, Roche P, Faucher C, Maillet F, Truchet G, Prome JC, Denarie J: Symbiotic host-specificity of Rhizobium meliloti is determined by a sulphated and acylated glucosamine oligosaccharide signal. Nature. 1990, 344 (6268): 781-784. 10.1038/344781a0.View ArticlePubMedGoogle Scholar
- Goodner B, Hinkle G, Gattung S, Miller N, Blanchard M, Qurollo B, Goldman BS, Cao Y, Askenazi M, Halling C: Genome sequence of the plant pathogen and biotechnology agent Agrobacterium tumefaciens C58. Science. 2001, 294 (5550): 2323-2328. 10.1126/science.1066803.View ArticlePubMedGoogle Scholar
- DelVecchio VG, Kapatral V, Redkar RJ, Patra G, Mujer C, Los T, Ivanova N, Anderson I, Bhattacharyya A, Lykidis A: The genome sequence of the facultative intracellular pathogen Brucella melitensis. Proc Natl Acad Sci USA. 2002, 99 (1): 443-448. 10.1073/pnas.221575398.PubMed CentralView ArticlePubMedGoogle Scholar
- Qin A, Tucker AM, Hines A, Wood DO: Transposon mutagenesis of the obligate intracellular pathogen Rickettsia prowazekii. Appl Environ Microbiol. 2004, 70 (5): 2816-2822. 10.1128/AEM.70.5.2816-2822.2004.PubMed CentralView ArticlePubMedGoogle Scholar
- Mackenzie C, Choudhary M, Larimer FW, Predki PF, Stilwagen S, Armitage JP, Barber RD, Donohue TJ, Hosler JP, Newman JE: The home stretch, a first analysis of the nearly completed genome of Rhodobacter sphaeroides 2.4.1. Photosynth Res. 2001, 70 (1): 19-41. 10.1023/A:1013831823701.View ArticlePubMedGoogle Scholar
- Garcia-Vallve S, Romeu A, Palau J: Horizontal gene transfer in bacterial and archaeal complete genomes. Genome Res. 2000, 10 (11): 1719-1725. 10.1101/gr.130000.PubMed CentralView ArticlePubMedGoogle Scholar
- Ochman H, Lawrence JG, Groisman EA: Lateral gene transfer and the nature of bacterial innovation. Nature. 2000, 405 (6784): 299-304. 10.1038/35012500.View ArticlePubMedGoogle Scholar
- Thomas CM, Nielsen KM: Mechanisms of, and barriers to, horizontal gene transfer between bacteria. Nat Rev Microbiol. 2005, 3 (9): 711-721. 10.1038/nrmicro1234.View ArticlePubMedGoogle Scholar
- Ohno S: Evolution by Gene Duplication. 1970, New York: Springer-VerlagView ArticleGoogle Scholar
- Taylor JS, Raes J: Duplication and divergence: the evolution of new genes and old ideas. Annu Rev Genet. 2004, 38: 615-643. 10.1146/annurev.genet.38.072902.092831.View ArticlePubMedGoogle Scholar
- Koonin EV, Galperin MY: Prokaryotic genomes: the emerging paradigm of genome-based microbiology. Curr Opin Genet Dev. 1997, 7 (6): 757-763. 10.1016/S0959-437X(97)80037-8.View ArticlePubMedGoogle Scholar
- Rubin GM, Yandell MD, Wortman JR, Gabor Miklos GL, Nelson CR, Hariharan IK, Fortini ME, Li PW, Apweiler R, Fleischmann W: Comparative genomics of the eukaryotes. Science. 2000, 287 (5461): 2204-2215. 10.1126/science.287.5461.2204.PubMed CentralView ArticlePubMedGoogle Scholar
- Kondrashov FA, Rogozin IB, Wolf YI, Koonin EV: Selection in the evolution of gene duplications. Genome Biol. 2002, 3 (2): research0008.0001-0008.0009. 10.1186/gb-2002-3-2-research0008.View ArticleGoogle Scholar
- Zhang J: Evolution by gene duplication: an update. Trends Ecol Evol. 2003, 18 (6): 292-298. 10.1016/S0169-5347(03)00033-8.View ArticleGoogle Scholar
- Gevers D, Vandepoele K, Simillon C, Van de Peer Y: Gene duplication and biased functional retention of paralogs in bacterial genomes. Trends Microbiol. 2004, 12 (4): 148-154. 10.1016/j.tim.2004.02.007.View ArticlePubMedGoogle Scholar
- Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25 (17): 3389-3402. 10.1093/nar/25.17.3389.PubMed CentralView ArticlePubMedGoogle Scholar
- Lynch M: Genomics. Gene duplication and evolution. Science. 2002, 297 (5583): 945-947. 10.1126/science.1075472.View ArticlePubMedGoogle Scholar
- Choudhary M, Fu YX, Mackenzie C, Kaplan S: DNA sequence duplication in Rhodobacter sphaeroides 2.4.1: evidence of an ancient partnership between chromosomes I and II. J Bacteriol. 2004, 186 (7): 2019-2027. 10.1128/JB.186.7.2019-2027.2004.PubMed CentralView ArticlePubMedGoogle Scholar
- Koonin EV: Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet. 2005, 39: 309-338. 10.1146/annurev.genet.39.073003.114725.View ArticlePubMedGoogle Scholar
- Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN: The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003, 4: 41-10.1186/1471-2105-4-41.PubMed CentralView ArticlePubMedGoogle Scholar
- Tatusov RL, Koonin EV, Lipman DJ: A genomic perspective on protein families. Science. 1997, 278 (5338): 631-637. 10.1126/science.278.5338.631.View ArticlePubMedGoogle Scholar
- Drummond A, Ashton B, Cheung M, Heled J, Kearse M, Moir R, Stones-Havas S, Thierer T, Wilson A: Geneious v4.6. 2007, 4.6, Google Scholar
- Langkjaer RB, Cliften PF, Johnston M, Piskur J: Yeast genome duplication was followed by asynchronous differentiation of duplicated genes. Nature. 2003, 421 (6925): 848-852. 10.1038/nature01419.View ArticlePubMedGoogle Scholar
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32 (5): 1792-1797. 10.1093/nar/gkh340.PubMed CentralView ArticlePubMedGoogle Scholar
- Guindon S, Gascuel O: A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol. 2003, 52 (5): 696-704. 10.1080/10635150390235520.View ArticlePubMedGoogle Scholar
- Whelan S, Goldman N: A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. Mol Biol Evol. 2001, 18 (5): 691-699.View ArticlePubMedGoogle Scholar
- Yang Z, Nielsen R: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000, 17 (1): 32-43.View ArticlePubMedGoogle Scholar
- Zhang Z, Li J, Yu J: Computing Ka and Ks with a consideration of unequal transitional substitutions. BMC Evol Biol. 2006, 6: 44-10.1186/1471-2148-6-44.PubMed CentralView ArticlePubMedGoogle Scholar
- Zhang Z, Li J, Zhao X-Q, Wang J, Wong GK-S, Yu J: KaKs_Calculator: Calculating Ka and Ks Through Model Selection and Model Averaging. Genomics, Proteomics & Bioinformatics. 2006, 4 (4): 259-263.View ArticleGoogle Scholar
- Vernikos GS, Parkhill J: Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands. Bioinformatics. 2006, 22 (18): 2196-2203. 10.1093/bioinformatics/btl369.View ArticlePubMedGoogle Scholar
- Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualization and annotation. Bioinformatics. 2000, 16 (10): 944-945. 10.1093/bioinformatics/16.10.944.View ArticlePubMedGoogle Scholar
- Tomb JF, White O, Kerlavage AR, Clayton RA, Sutton GG, Fleischmann RD, Ketchum KA, Klenk HP, Gill S, Dougherty BA: The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature. 1997, 388 (6642): 539-547. 10.1038/41483.View ArticlePubMedGoogle Scholar
- Himmelreich R, Hilbert H, Plagens H, Pirkl E, Li BC, Herrmann R: Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 1996, 24 (22): 4420-4449. 10.1093/nar/24.22.4420.PubMed CentralView ArticlePubMedGoogle Scholar
- Klenk HP, Clayton RA, Tomb JF, White O, Nelson KE, Ketchum KA, Dodson RJ, Gwinn M, Hickey EK, Peterson JD: The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon Archaeoglobus fulgidus. Nature. 1997, 390 (6658): 364-370. 10.1038/37052.View ArticlePubMedGoogle Scholar
- Katju V, Lynch M: On the formation of novel genes by duplication in the Caenorhabditis elegans genome. Mol Biol Evol. 2006, 23 (5): 1056-1067. 10.1093/molbev/msj114.View ArticlePubMedGoogle Scholar
- Li WH, Gu Z, Wang H, Nekrutenko A: Evolutionary analyses of the human genome. Nature. 2001, 409 (6822): 847-849. 10.1038/35057039.View ArticlePubMedGoogle Scholar
- The Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408 (6814): 796-815. 10.1038/35048692.View ArticleGoogle Scholar
- Roth C, Rastogi S, Arvestad L, Dittmar K, Light S, Ekman D, Liberles DA: Evolution after gene duplication: models, mechanisms, sequences, systems, and organisms. J Exp Zoolog B Mol Dev Evol. 2007, 308 (1): 58-73. 10.1002/jez.b.21124.View ArticleGoogle Scholar
- Wolfe KH, Shields DC: Molecular evidence for an ancient duplication of the entire yeast genome. Nature. 1997, 387 (6634): 708-713. 10.1038/42711.View ArticlePubMedGoogle Scholar
- Ziolkowski PA, Blanc G, Sadowski J: Structural divergence of chromosomal segments that arose from successive duplication events in the Arabidopsis genome. Nucleic Acids Res. 2003, 31 (4): 1339-1350. 10.1093/nar/gkg201.PubMed CentralView ArticlePubMedGoogle Scholar
- Choudhary M, Zanhua X, Fu YX, Kaplan S: Genome analyses of three strains of Rhodobacter sphaeroides: evidence of rapid evolution of chromosome II. J Bacteriol. 2007, 189 (5): 1914-1921. 10.1128/JB.01498-06.PubMed CentralView ArticlePubMedGoogle Scholar
- Choudhary M, Mackenzie C, Donohue T, Kaplan S: Purple Bacterial Genomics. The Purple Phototrophic Bacteria. Edited by: Hunter CN, Daldal F, Thurnauer MC, Beatty JT. 2008, Dordrecht, Netherlands: Springer, 28: 691-706. full_text.View ArticleGoogle Scholar
- Capdevila S, Martinez-Granero FM, Sanchez-Contreras M, Rivilla R, Martin M: Analysis of Pseudomonas fluorescens F113 genes implicated in flagellar filament synthesis and their role in competitive root colonization. Microbiology. 2004, 150 (Pt 11): 3889-3897. 10.1099/mic.0.27362-0.View ArticlePubMedGoogle Scholar
- Kanbe M, Yagasaki J, Zehner S, Gottfert M, Aizawa S: Characterization of two sets of subpolar flagella in Bradyrhizobium japonicum. J Bacteriol. 2007, 189 (3): 1083-1089. 10.1128/JB.01405-06.PubMed CentralView ArticlePubMedGoogle Scholar
- Corbett KD, Schoeffler AJ, Thomsen ND, Berger JM: The structural basis for substrate specificity in DNA topoisomerase IV. J Mol Biol. 2005, 351 (3): 545-561. 10.1016/j.jmb.2005.06.029.View ArticlePubMedGoogle Scholar
- Jacoby GA: Mechanisms of resistance to quinolones. Clin Infect Dis. 2005, 41 (Suppl 2): S120-126. 10.1086/428052.View ArticlePubMedGoogle Scholar
- Haas M, Beyer D, Gahlmann R, Freiberg C: YkrB is the main peptide deformylase in Bacillus subtilis, a eubacterium containing two functional peptide deformylases. Microbiology. 2001, 147 (Pt 7): 1783-1791.View ArticlePubMedGoogle Scholar
- Tabita FR: The biochemistry and metabolic regulation of carbon metabolism and CO2 fixation in purple bacteria. Anoxygenic Photosynthetic Bacteria. Edited by: Blankenship RE, Madigan MT, Bauer CE. 1995, Dordrecht, the Netherlands: Kluwer Academic, 2: 885-914. full_text.View ArticleGoogle Scholar
- Lorimer GH, Chen YR, Hartman FC: A role for the epsilon-amino group of lysine-334 of ribulose-1,5-bisphosphate carboxylase in the addition of carbon dioxide to the 2,3-enediol(ate) of ribulose 1,5-bisphosphate. Biochemistry. 1993, 32 (35): 9018-9024. 10.1021/bi00086a006.View ArticlePubMedGoogle Scholar
- Read BA, Tabita FR: High substrate specificity factor ribulose bisphosphate carboxylase/oxygenase from eukaryotic marine algae and properties of recombinant cyanobacterial RubiSCO containing "algal" residue modifications. Arch Biochem Biophys. 1994, 312 (1): 210-218. 10.1006/abbi.1994.1301.View ArticlePubMedGoogle Scholar
- Watson GM, Tabita FR: Microbial ribulose 1,5-bisphosphate carboxylase/oxygenase: a molecule for phylogenetic and enzymological investigation. FEMS Microbiol Lett. 1997, 146 (1): 13-22. 10.1111/j.1574-6968.1997.tb10165.x.View ArticlePubMedGoogle Scholar
- Plaumann M, Pelzer-Reith B, Martin WF, Schnarrenberger C: Multiple recruitment of class-I aldolase to chloroplasts and eubacterial origin of eukaryotic class-II aldolases revealed by cDNAs from Euglena gracilis. Curr Genet. 1997, 31 (5): 430-438. 10.1007/s002940050226.View ArticlePubMedGoogle Scholar
- Siebers B, Brinkmann H, Dorr C, Tjaden B, Lilie H, van der Oost J, Verhees CH: Archaeal fructose-1,6-bisphosphate aldolases constitute a new family of archaeal type class I aldolase. J Biol Chem. 2001, 276 (31): 28710-28718. 10.1074/jbc.M103447200.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.