Skip to main content
  • Research article
  • Open access
  • Published:

Phylogenetic analysis of erythritol catabolic loci within theRhizobiales and Proteobacteria



The ability to use erythritol as a sole carbon source is not universal amongthe Rhizobiaceae. Based on the relatedness to the catabolic genes inBrucella it has been suggested that the eryABCD operonmay have been horizontally transferred into Rhizobium. During workcharacterizing a locus necessary for the transport and catabolism oferythritol, adonitol and L-arabitol in Sinorhizobium meliloti, webecame interested in the differences between the erythritol loci of S.meliloti and R. leguminosarum. Utilizing the OrthologNeighborhood Viewer from the DOE Joint Genome Institute database it appearedthat loci for erythritol and polyol utilization had distinct arrangementsthat suggested these loci may have undergone genetic rearrangements.


A data set was established of genetic loci containing erythritol/polyolorthologs for 19 different proteobacterial species. These loci were analyzedfor genetic content and arrangement of genes associated with erythritol,adonitol and L-arabitol catabolism. Phylogenetic trees were constructed forcore erythritol catabolic genes and contrasted with the species phylogeny.Additionally, phylogenetic trees were constructed for genes that showeddifferences in arrangement among the putative erythritol loci in thesespecies.


Three distinct erythritol/polyol loci arrangements have been identified thatreflect metabolic need or specialization. Comparison of the phylogenetictrees of core erythritol catabolic genes with species phylogeny providesevidence that is consistent with these loci having been horizontallytransferred from the alpha-proteobacteria into both the beta andgamma-proteobacteria. ABC transporters within these loci adopt 2 uniquegenetic arrangements, and although biological data suggests they arefunctional erythritol transporters, phylogenetic analysis suggests they maynot be orthologs and probably should be considered analogs. Finally,evidence for the presence of paralogs, and xenologs of erythritol catabolicgenes in some of the genomes included in the analysis is provided.


Operons are multigene arrangements transcribed as a single mRNA and are one of thedefining features found in bacterial and archaeal genomes. This arrangement allowsgenes to be co-regulated, and members of operons are usually involved in the samefunctional pathway [1, 2]. Although operons are prominent features in the genomes of bacteria andarchaea, the evolution and mechanisms that promote operon formation are still notresolved and a number mechanisms have been proposed [38]. These mechanisms involve dynamic genetic events that include genetransfer events, deletions, duplications, and recombinations [2, 5, 8]. Since operons are prominent features in bacterial genomes, and oftenencode genes with metabolic potential, it may be assumed that their evolution isunder some selection pressure, thus allowing prokaryotic cells to rapidly adapt,compete and grow under changing environmental conditions.

The metabolic capability of an organism can be a function of its genome size and genecomplement and these greatly affect its ability to live in diverse environments. Thealpha subdivision of the proteobacteria includes some organisms that are verysimilar phylogenetically but inhabit many diverse ecological niches, including anumber of bacteria that can interact with eukaryotic hosts [9]. The genome sizes of these organisms varies from about 1 MB formembers of the genus Rickettsia to approximately 9 MB for members ofthe bradyrhizobia [10]. Comparative genomic studies of this group has led to the suppositionthat there has been two independent reductions in genomic size, one which gave riseto the Brucella and Bartonella, the other which gave rise to theRickettsia[11]. In addition, it also suggests that there has been a major genomicexpansion and that roughly correlates with the soil microbes within the orderRhizobiales [11]. The genomes of Rhizobia are dynamic. Phylogenetic analysis of 26different Sinorhizobium and Bradyrhizobium genomes recently showedthat recombination has dominated the evolution of the core genome in theseorganisms, and that vertically transmitted genes were rare compared with genes witha history of recombination and lateral gene transfer [12]. In this manuscript we have utilized comparative genomics in a focusedmanner to investigate the evolution of genes and loci involved in the catabolism ofthe sugar alcohols erythritol, adonitol and L-arabitol, primarily within thealpha-proteobacteria.

The number of bacterial species that are capable of utilizing the common 4 carbonpolyol, erythritol, as a carbon source is restricted [13]. Catabolism of erythritol has been shown to be important for competitionfor nodule occupancy in Rhizobium leguminosarum as well as for virulence inthe animal pathogen Brucella suis[14]. Genetic characterization of erythritol catabolic loci has only beenperformed in R. leguminosarum, B. abortus and Sinorhizobiummeliloti. In these organ-isms erythritol is broken down todihydroxyacetone-phosphate using the core erythritol catabolic geneseryABC-tpiB[15]. During characterization of the erythritol locus of S. meliloti,it was observed that despite the close homologies of core erythritol genes, thegenetic content and arrangement of the locus was drastically different from thepreviously characterized loci of B. abortus and R. leguminosarum[16]. In particular the locus encodes the catabolism of two 5-carbon pentitols(adonitol and L-arabitol) in addition to erythritol. It was shown that the ABCtransporter encoded by mptABCDE and erythritol kinase encoded byeryA can also be used for adonitol and L-arabitol, and several genes inthe locus are involved in adonitol and L-arabitol, but not erythritol catabolismincluding lalA-rbtABC[15].

The differences between the erythritol loci in the sequenced S. melilotistrain Rm1021 [17], and R. leguminosarum, led us to question what the relationshipof these erythritol catabolic loci may be to other putative erythritol catabolicloci in bacterial species. In this work we focus on this question by analyzing thecontent and synteny of loci containing homologs to the erythritol genes in othersequenced organisms. The results of the analysis lend support to several hypothesesregarding operon evolution, and in addition, the data predicts loci that may beinvolved in polyol transport and metabolism in other proteobacteria.


Identification of erythritol loci

The data set of erythritol loci utilized in this work was constructed in atwo-step process. First BLASTN was used to identify sequenced genomes containinghomologs to the core erythritol catabolic genes R. leguminosarum andS. meliloti[18]. The use of BLASTN rather than BLASTP at this stage allowed us torefine the search to bacteria with sequenced genomes. Furthermore, limiting thesearch to genes with highly similar sequences by using BLASTN allowed us tolimit our search to only genes that are likely involved in erythritolcatabolism, since all of these genes encode proteins in highly ubiquitousfamilies found throughout bacterial genomes. Initially BLASTN searches wereperformed using all the core erythritol genes shared between R.leguminosarum and S. meliloti (eryA, eryB, eryC anderyD). However, the search using eryA provided the mostdiverse data set that also showed a sharp drop in E-value and query coverage.Using either eryA from R. leguminosarum, or eryA fromS. meliloti for the BLASTN search resulted in an identical dataset. Genomes containing homologs to eryA were selected on the basis ofE-values less than 1.00E-5. In cases where multiple strains of the samebacterial species were found to have highly homologous putative erythritol genes(>99% identity) only a single representative of the species was used to avoidredundancy. Additionally B. melitensis 16M and B. suis 1330were chosen as representatives of the Brucella lineage despite a largenumber of Brucella species that were identified in our search due tothe high degrees of similarity between their erythritol catabolic genes.

Second, the genetic region containing eryA in these organisms wasidentified and analyzed using the IMG Ortholog Neighborhood Viewer( [19] in order to construct the gene maps (loci). The amino acid sequenceof EryA from S. meliloti was used as a query for the IMG OrthologNeighborhood Viewer search.

To analyze the genetic content of organisms in our data set, the amino acidsequence encoded by each gene involved in erythritol catabolism in R.leguminosarum, or in erythritol, adonitol or L-arabitol catabolism inS. meliloti, was individually used in a BLASTP search of the 19genomes in the data set. The sugar binding proteins of the S. melilotiand R. leguminosarum transporter were used as representatives of theentire ABC transporter. Identity cut-off values that were used to delineatepotential homologs to erythritol proteins were unique to each query amino acidsequence. Cut-off values were as follows: MptA: 56%, EryD: 44%, EryA: 46%, RbtA:50%, EryB: 65%, LalA: 49%, RbtB: 51%, RbtC: 40%, EryC: 68%, TpiB: 69%, EryR:61%, EryG: 73%. These values were manually determined and generally correlatedto a large drop in percentage identity within the BLASTP hits.

Homologs identified that were not within the primary eryA containingloci were used as a query within IMG-Ortholog neighborhood viewer to analyze theregion surrounding them. Secondary loci containing homologs to some of thesegenes were identified in Mesorhizobium sp. and Sinorhizobiumfredii. These loci are putative erythritol loci based on homology toknown loci involved in erythritol catabolism in Sinorhizobium meliloti[15, 16], Rhizobium leguminosarum[20]and Brucella abortus[21]. Despite not having been experimentally verified we will refer to allloci in our data set as erythritol loci for the purpose of this manuscript.

Phylogenetic analysis

Amino acid sequences of homologs to proteins previously shown to play a role inerythritol, adonitol or L-arabitol catabolism from each of the organisms in thedata set were collected and used for phylogenetic analysis. The 16SrDNA and RpoD sequences were also extracted from the NCBI databasefor species examined in this study in order to obtain a potential species treethat could be compared with the various phylogenetic gene trees obtained fromthe individual genes located within the polyol (i.e. erythritol, arabitol, andadonitol) utilization loci. Amino acid sequences were aligned using Clustal-X [22] and PRALINE [23] the resulting alignments were refined manually with the GeneDocprogram v2.5.010 [24].

Phylogenies were generated with maximum likelihood analysis (ML) as implementedin the Molecular Evolutionary Genetic Analysis package (MEGA5) [25] and with MrBayes [26]. MEGA5 was used to identify the most suitable substitution models forthe aligned data sets. In order to evaluate support for the nodes observed inthe ML phylogenetic trees bootstrap analysis [27] was conducted by analysing 1000 pseudo replicates.

The MrBayes program (v3.1) was used for Bayesian analysis [26, 28] and the parameters set for amino acid alignments were mixed modelsand for the 16S rDNA gamma distribution with 4 rate categories. The models used(setting mixed model) for generating the final 50% majority rule trees wereestimated by the program itself. The Bayesian inference of phylogenies wasinitiated from a random starting tree and four chains were run simultaneouslyfor 1 000 000 generations; trees were sampled every 100 generations. The first25% of trees generated were discarded (“burn-in”) and the remainingtrees were used to compute the posterior probability values.

Phylogenetic trees were constructed for RpoD, 16S rDNA and all the key genesassociated with the EryA genes. Phylogenetic trees were plotted with theTreeView program [29] using MEGA5 and/or MrBayes tree outfiles. Final trees were annotatedusing Adobe Illustrator.


Phylogenetic distribution of putative erythritol loci

Based on homology to eryA from Sinorhizobium meliloti andRhizobium leguminosarum we have compiled a data set of 19 differentputative erythritol loci from 19 different proteobacteria (Table  1). Previous studies suggested that erythritol loci may berestricted to the alpha-proteobacteria [20]. While a majority of the erythritol loci we identified followed thisscheme, surprisingly we identified putative erythritol catabolic loci inVerminephrobacter eiseniae (a beta-proteobacterium) andEscherichia fergusonii (a gamma-proteobacterium). Erythritol lociare not widely distributed through the alpha-proteobacteria. A majority of theloci we identified were within the order Rhizobiales. Outside of the Rhizobialeswe also identified erythritol loci in Acidiphilium species andRoseobacter species. Within the Rhizobiales, erythritol loci werenotably absent from a large number of bacterial species such as Rhizobiumetli, Agrobacterium tumefaciens and Bradyrhizobium japonicumthat are closely related to other species that we have identified that containerythritol loci. We also note that erythritol loci appear to be plasmidlocalized only in S. fredii and R. leguminosarum. In all othercases the loci appear to be found on chromosomes.

Table 1 Bacterial genomes used in this study containing erythritol loci

Genetic content of loci

The genetic content of each of the organisms ery loci were analyzed byconducting a BLASTP search to the 19 genomes in our data set of the amino acidsequence of each gene associated with erythritol catabolism in R.leguminosarum, or erythritol, adonitol or L-arabitol catabolism inS. meliloti. The results of the BLAST search are presented inTable  2, depicting the presence or absence ofhomologs to erythritol, adonitol or L-arabitol catabolic genes in each of thegenomes that was investigated. Gene maps of erythritol loci were constructedbased on the output of our IMG Ortholog Neighborhood Viewer searches and aredepicted in Figure  1.

Figure 1
figure 1

The genetic arrangement of putative erythritol loci in theproteobacteria. Genes are represented by coloured boxes andidentical colours identify genes that are believed to be homologous.Gene names are given below the boxes for Sinorhizobium melilotiand Rhizobium leguminosarum. Loci arrangements are depictedbased on the output from the IMG Ortholog Neighborhood Viewer primarilyusing the amino acid sequence EryA from Sinorhizobium meliloti,and Rhizobium leguminosarum. Gene names in the legend generallycorrespond to the annotations in R. leguminosarum and S.meliloti.

Table 2 Content of putative erythritol loci

Genes encoding homologs to the core erythritol proteins EryA, EryB and EryD wereubiquitous throughout our data set (Table  2). Withrespect to the remaining genes, the genetic content of the species can begrouped into three broad categories. (1) Species that contain genes encodinghomologs associated with erythritol, adonitol and L-arabitol catabolism. Thisincludes S. meliloti, S. medicae, S. fredii, M. loti, M. opportunism, M.ciceri, R. denitrificans and R. litoralis. These genomescontained homologs to genes that encode enzymes specifically involved erythritolcatabolism such as EryC, and TpiB as well as specifically involved in adonitoland L-arabitol catabolism including LalA, and RbtBC. They also containgenes encoding an ABC transporter homologous to the S. melilotierythritol, adonitol and L-arabitol transporter (MptABCDE) and do not encodehomologs to the R. leguminosarum erythritol transporter (EryEFG). Onenotable exception is M. ciceri which encodes EryEFG homologs ratherthan MptABCDE (Table  2). (2) Species that containall the genes associated with erythritol catabolism, but lack the genesassociated with adonitol or L-arabitol catabolism. These species include R.leguminosarum bvs. viciae and trifolii, A.radiobacter, O. anthropi, B. suis, B. melitensis, and E.fergusonii. These loci encode EryABCDR-TpiB as well as homologs to theR. leguminosarum ABC transporter EryEFG, but lack genes encodinghomologs to enzymes associated specifically with adonitol and L-arabitolcatabolism or the S. meliloti transport protein MptABCDE. E.fergusonii contains the most minimal set of homologs to erythritolgenes of all the genomes investigated, and did not encode EryR and TpiB. (3)Species that do not encode the specifically erythritol associated EryC, EryR,and TpiB, but encode the adonitol/L-arabitol catabolic complement LalA-RbtABCand homologs to the S. meliloti polyol transporter MptABCDE. Theseinclude Bradyrhizobium spp. BTAi1 and ORS278, A. multivorum, A.cryptum and V. eiseniae.

The genetic structure of erythritol loci

The genetic context of eryA in each of the genomes in our data setsupported that each of these organisms contained an erythritol locus. A physicalmap of the loci in each of these organisms is depicted in Figure  1. Of note, a number of putative erythritol loci wereidentified in organisms with incomplete genome sequences at the time ofanalysis, and thus are not discussed here, including: Octadecabacterantarcticus, Pelagibaca bermudensis Enterobacter hormaechei,Fulvimarina pelagi, Aurantimonas sp. SI85-9A1, Roseibium sp.TrichSKD4, Burkholderia thailandensis and Stappia aggregata.

The putative erythritol loci of bacteria in our data set ranged in geneticcomplexity with the loci from S. meliloti and S. medicaecontaining 17 different genes, to the simplest being the locus of E.fergusonii, which contained only two divergently transcribed operonsthat are homologous to the eryEFG and eryABCD loci of R.leguminosarum. A number of species contained loci that were identicalin content and arrangement to the R. leguminosarum erythritol locusincluding members of the Brucella, Ochrobacterum, andAgrobacterium. The only species that contains a locus identical incontent and arrangement to S. meliloti is the closely relatedSinorhizobium medicae. The locus of Sinorhizobium frediiNGR234, contains all but one of the genes (fucA1) found in the otherSinorhizobium loci (Figure  2).

Figure 2
figure 2

The phylogenetic tree of erythritol proteins does not correlate withspecies phylogeny; evidence for horizontal gene transfer. EryAphylogenetic tree (Left) and RpoD species tree (Right) were constructedusing ML and Bayesian analysis. Support for each clade is expressed as apercentage (Bayesian/ML, ie. posterior probability and bootstrap valuesrespectively) adjacent to the nodes that supports the monophyly ofvarious clades. V. eiseniae was used as an outgroup for bothtrees since it was the most phylogenetically distant organism. A treeincluding branch lengths for EryA is included as Additional file 1: Figure S1.

The loci of Mesorhizobium species were varied, however all threeMesorhizobium sp. contained an independent locus with homologs tolalA and rbtBC elsewhere in the genome (Figure  1). Interestingly, while Mesorhizobium loti andMesorhizobium opportunism both contain transporters homologous tomptABCDE, Mesorhizobium ciceri bv. biserrulaecontains a transporter homologous to eryEFG. This operon also containsthe same hypothetical gene that is found at the beginning of the R.leguminosarum eryEFG transcript. The transporters however, arearranged in a manner similar to that seen in S. meliloti and the geneencoding the regulator eryD, is found ahead of the transporter genes,whereas in R. leguminosarum and Brucella, eryD isfound following eryC (Figure  1). We alsonote that whereas M. loti and M. opportunism both contain aputative fructose 1,6 bis phosphate aldolase gene between theeryR-tpiB-rpiB operon and eryC, a homolog to this is alsogene is found adjacent to the rpiB in Brucella.

Bradyrhizobium sp. BTAi1, and ORS278, A. cryptum and V.eiseniae all have similar genetic arrangement to that of S.meliloti, except that they do not contain a homolog to eryC,or an associated eryR-tpiB-rpiB operon. These loci also differprimarily in their arrangement of lalA-rbtBC (Figure  1).

The phylogenies of erythritol proteins do not correlate with speciesphylogeny

The DNA sequences of 16S rDNA (data not shown) as well as the amino acidsequences of RpoD were extracted from GenBank to analyze the phylogeneticrelationships of the organisms examined in this study, using the mostphylogenetically distant organism Verminephrobacter eiseniae as anout-group. The results of the 16S rDNA and RpoD sequence analyses werein concordance with each other and are consistent with phylogenies that havebeen previously generated [42]. Initial comparison of the operon structures with the generatedphylogenies suggested that the operon structure(s) did not correlate with thespecies phylogeny. Since the structure of some operons did not correspond wellwith the species phylogenies we wished to determine if operon structure didcorrelate with any of the erythritol genes found at the S. melilotiloci. Since homologs to EryA, EryB and EryD were ubiquitous through the dataset, it was decided to construct phylogenies based on Maximum Likelihood andBayesian analysis using the EryA, EryB and EryD data sets. The topology of thephylogenetic tree using EryA is presented in Figure  2. A tree including branch lengths is included as Additional file1: Figure S1. V. eiseniae was also themost distant member with respect to the EryA phylogeny and again used as anoutgroup. The phylogenetic trees of EryB and EryD are not shown but weregenerally consistent with the EryA phylogeny. The species tree, based on RpoD,was included as a mirror tree with the EryA tree to demonstrate possiblehorizontal gene transfer events (Figure  2).

The data show that there is a high degree of correlation between the lociconfiguration and the EryA phylogenetic tree (Figure  1, 2). We note the similarity of the loci ofA. radiobacter and R. leguminosarum to Brucellaspecies and O. anthropi but not to the more closely relatedSinorhizobium species. This suggests that a horizontal genetransfer may have occurred between these organisms. This is in agreement withwhat has been previously reported [20]. It also seems likely that a horizontal gene transfer event may haveoccurred between the Brucella and E. fergusonii. This mayexplain the unique occurrence of the loci’s presence in a member of thegamma-proteobacteria. Finally, our mirror tree suggests that a horizontal genetransfer of the more complex erythritol locus may have occurred between M.loti and an ancestral species the Sinorhizobium species(Figure  2).

Modes of evolution for the polyol utilization loci

Comparison of the phylogenetic trees of EryA, EryB and EryD to the arrangementand content of the loci led us to more thoroughly investigate the phylogenies ofa number of proteins that stood out as unique within the data set. Thesephylogenies have led us to postulate modes of evolution that may have occurredin these loci.

BLASTP analysis showed a clear distinction between the type of transporterencoded by each of the loci and the remaining genetic content. In general, locithat contained adonitol/L-arabitol type genes contained a transporter homologousto the S. meliloti MptABCDE (Table  2,Figure  1). Loci that contained only erythritol genescontained a transporter homologous to the EryEFG of R. leguminosarum.One exception to this correlation was M. ciceri bv. biserrulaewhich contained a homologous transporter to EryEFG rather than MptABCDE. This isinteresting because M. ciceri groups with the otherMesorhizobia in the EryABD trees. In order to analyze the evolutionof these transporters more clearly, phylogenetic trees were constructed ofhomologs to EryG and homologs to MptA (Figure  3). Ingeneral the phylogenies are in agreement with the EryABD phylogenies, with theexception of M. ciceri which falls on a basal branch of the EryGphylogeny. The disparities between the EryG and EryABD phylogenies of Mciceri strongly suggest that parts of its erythritol locus have adifferent origin. This may have been the result of horizontal gene transfer of asecond R. leguminosarum type erythritol locus, followed byrecombination between the two.

Figure 3
figure 3

Phylogenetic trees of erythritol transporters. Unrootedphylogenetic tree including putative homologues to the sugar bindingprotein MptA of Sinorhizobium meliloti and EryG ofRhizobium leguminosarum (A). Support is provided forthe node that clearly separates the putative homologues into twodistinct and distant clades. Separate phylogenetic trees for erythritoltransporters homologous to MptABCDE and EryEFG are depicted (Band C) using aligned amino acid sequences of the putative sugarbinding proteins MptA (B) and EryG (C) as representativesof the transporters phylogenies. The branch that shows the anomalousplacement of the Mesorhizobium ciceri bv. biserrulaewithin the tree of EryEFG homologs is highlighted in red. Trees wereconstructed using ML and Bayesian analysis. Support for each node isexpressed as a percentage based on posterior probabilities (Bayesiananalysis) and bootstrap values (ML). The branch lengths are based on MLanalysis and are proportional to the number of substitutions persite.

In two organisms, apparent duplications of genes were present. In M.loti one homolog of lalA was present in the erythritol locus,while a second copy was present elsewhere in the genome adjacent to homologuesof rbtB and rbtC, consistent with its location in the othertwo Mesorhizobium genomes. In S. fredii homologs to theapparent small operon that contains eryR-tpiB-rpiB were found both, asexpected, in the erythritol locus, but also elsewhere on the chromosome in thesame arrangement. To analyze the evolutionary history of these duplicationsphylogenetic trees were constructed for the LalA and TpiB homologs (Figure 4 and 5). The two copies of thelalA gene in M. loti are most likely an example ofparalogs, as they still group within the same clade among other lalAhomologs (Figure  4). The tpiB genes(Figure  5) in S. fredii are possibleexamples of xenologs [43] as the phylogenetic tree shows that the two versions of thetpiB gene in S. fredii are only distantly related, withone homolog grouping within the expected clade that includes S. medicaeand S. meliloti and the second homolog (not part of the main locus)showing monophyly with those found in a clade containing R.leguminosarum sp., B. suis, etc. (Figure  5).

Figure 4
figure 4

Mesorhizobium loti contains paralogs of LalA. Thephylogeny of the L-arabitol catabolic gene LalA is depicted.Mesorhizobium loti contains a copy of lalA withinan independent suboperon like the other Mesorhizobium species,as well as a second lalA homolog within the erythritol locus(Figure  1). The branch corresponding to theadditional homolog within the erythritol locus is highlighted in red.The tree was constructed using ML and Bayesian analysis. Support foreach node is expressed as a percentage based on posterior probabilities(Bayesian analysis) and bootstrap values (ML). The branch lengths arebased on ML analysis and are proportional to the number of substitutionsper site.

Figure 5
figure 5

Sinorhizobium fredii encodes TpiB xenologs. Sinorhizobium fredii contains a second suboperon thatappears homologous to the eryR-tpiB-rpiB suboperon in theerythritol locus (Figure  1). The TpiB aminoacid sequence was used as a representative of this suboperon toconstruct a phylogenetic tree. The branch corresponding to the TpiBencoded outside of the erythritol locus is highlighted in red. The treewas constructed using ML and Bayesian analysis. Support for each node isexpressed as a percentage based on posterior probabilities (Bayesiananalysis) and bootstrap values (ML). The branch lengths are based on MLanalysis and are proportional to the number of substitutions persite.


A number of models that are not mutually exclusive have been proposed to account forthe formation and evolution of operons. Two broad aspects need to be considered,transfer of genes between organisms, as well as gathering and distributing geneswithin a genome. There is strong support for horizontal gene transfer as a drivingforce for evolution of gene clusters [44]. More recently, it has been shown that genes acquired by horizontal genetransfer events appear to evolve more quickly than genes that have arisen by geneduplication events [45]. Within a genome the “piece-wise” model suggests that complexoperons can evolve through the independent clustering of smaller“sub-operons” due to selection pressures for the optimization forequimolarity and co-regulation of gene products [6]. Finally it has been suggested that the final stages of operon buildingcan be the loss of “ORFan” genes [4, 6].

The data presented here provide examples supporting these models of operon evolution.The components of the polyol catabolic loci we have identified have been involved inat least 3 horizontal gene transfers within the proteobacteria (Figure  2). In addition, components such as the transportereryEFG have been moved from the R. leguminosarum clade of lociinto the M. ciceri bv. biserrulae polyol locus (see Figure 3A and 3B). The later species basedon its phylogenetic position and category of polyol locus (S. meliloti)would have been expected to contain the mtpA gene. The presence of possibleparalogs of lalA (Figure  4) and the presence oftpiB xenologs (Figure  5) are also evidencefor duplication and horizontal transfer events. Since S. fredii alsocontains a homolog to tpiA of S. meliloti (data notshown), to our knowledge, this is the only example of an organismcontaining three triose-phosphate isomerases (Figure  2,Figure  5).

A striking example of a horizontal gene transfer and genetic rearrangement isexemplified by M. ciceri (Figure  1,Figure  2). It is likely that an exchange between M.loti and a common ancestor of S. meliloti, S. medicae and S.fredii NGR234 occurred. M. loti is located in the same clade asthe Brucella and O. anthropi in the species tree (Figure 2). Despite this, M. loti contains many of thegenes corresponding to the adonitol and L-arabitol type loci of other species thatcluster close to the base of the species tree such as Bradyrhizobium spp.(Figure  2). The presence of these factors in addition tothe chimeric composition of the M. loti locus leads us to hypothesise thatan ancestor of M. loti may have contained both an erythritol locus likethat of the Brucella as well as a polyol type locus like that seen in theBradyrhizobia, A. cryptum and V. eiseniae.

The lalA, rbtB, rbtC suboperon appears to be the key component of the polyollocus in the Bradyrhizobium type loci (Figure  1). Among the 19 loci identified, these three genes can be linked into asuboperon, embedded within the main locus (eg. R. litoralis) or split amongtwo transcriptional units (see A. cryptum or V.eiseniae). As well, the gene module (or suboperon) eryR, tpiB- rpiB ispresumably found in all erythritol utilizing bacteria. The acquisition of thismodule along with the lalA, rbtB and rbtC suboperon may haveallowed for the evolution of the more complex S. meliloti type locus (seeFigure  2).

The absence of fucA in S. fredii NGR234 and M. lotiappears to be an example of the loss of an “ORFan” gene event havingoccurred. The gene is still present in S. meliloti however it has beenshown that it is not necessary for the catabolism of erythritol, adonitol, orL-arabitol [15]. It is likely that it was lost during the divergence of M. lotiand S. fredii NGR234 from their common ancestors to S. meliloti.If this is true, it may be reasonable to assume that fucA may eventuallyalso be lost from the S. meliloti erythritol locus.

In S. meliloti, erythritol uptake has been shown to be carried out by theproteins encoded by mptABCDE[15, 16], whereas in R. leguminosarum growth using erythritol isdependent upon the eryEFG[20]. Although both transporters appear to carry out the same function, thephylogenetic analysis clearly shows that they have distinct ancestors and may bebest classified as analogues rather than orthologues (Figure  3). In addition, it has been shown that MptABCDE is also capable oftransporting adonitol and L-arabitol [15]. We note that these polyols appear to have stereo-chemical identity overthree carbons and that EryA of S. meliloti can also use adonitol andL-arabitol as substrates [15]. It is unknown whether EryA from R. leguminosarum has theability to interact with these substrates.

The three distinct groups of loci we have identified probably correspond to themetabolic potential of these regions to utilize polyols. The locus of S.meliloti has been shown to contain the full complement of genes required toconfer growth on using both erythritol and adonitol and L-arabitol as sole carbonsources [15, 16]. Given that S. fredii NGR234 and M. loti each containhomologs to all of these genes, except for fucA which is not necessary forthe catabolism of any of the sugars [15], it follows that these two loci may also be capable of catabolising allthree polyols. It has also been established that the B. abortus and R.leguminosarum type loci are used for erythritol catabolism, and given theannotation and degree of relatedness (E value = 0) of proteins belongingto all species in the clade, it is not expected that these loci would be capable ofbreaking down additional polyols [20, 21]. This is supported by the fact that the introduction of the R.leguminosarum cosmid containing the erythritol locus into S.meliloti strains unable to utilize erythritol, adonitol, and L-arabitolwere unable to be complemented for growth on adonitol and L-arabitol [15]. It is however necessary to remember that some of identified loci areonly correlated with polyol utilization based on our analysis and that basicbiological function, such as the ability to utilize these polyols has not beenpreviously described.

With the advent of newer generations of sequencing technologies a greater number ofbacterial genomes will be sequenced. It is likely that more examples ofrearrangements of catabolic loci through bacterial lineages will be observed. Sincethe ability to catabolize erythritol is found in relatively few bacterial species,operons that encode erythritol and other associated polyols may be ideal models toobserve operon evolution.


In this work we show that there are at least three distinct erythritol/polyol lociarrangements. Two distinct ABC transporters can be found within these within theseloci and phylogenetic analysis suggests these should be considered analogs. Finallywe provide evidence that suggest that these loci have been horizontally transferredfrom the alpha-proteobacteria into both the beta and gamma-proteobacteria.


  1. Omata T, Price GD, Badger MR, Okamura M, Gohta S, Ogawa T: Identification of an ATP-binding cassette transporter involved in bicarbonateuptake in the cyanobacterium Synechoccus sp. strain PCC 7942. Proc Natl Acad Sci USA. 1999, 96: 13571-13576. 10.1073/pnas.96.23.13571.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  2. Osbourn AE, Field B: Operons. Cell Mol Life Sci. 2009, 66: 3755-3775. 10.1007/s00018-009-0114-3.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  3. Omelchenko MV, Makarova KS, Wolf YI, Rogozin IB, Koonin EV: Evolution of mosaic operons by horizontal gene transfer and gene displacementin situ. Genome Biol. 2003, 4: R55-10.1186/gb-2003-4-9-r55.

    Article  PubMed  PubMed Central  Google Scholar 

  4. Fani R, Brilli M, Lio P: The origin and evolution of operons: the piecewise building of theproteobacterial histidine operon. J Mol Evol. 2005, 60: 370-390.

    Article  Google Scholar 

  5. Price MN, Arkin AP, Alm EJ: The life-cycle of operons. PLoS Genet. 2006, 2: e96-10.1371/journal.pgen.0020096.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Fondi M, Emiliani G, Fani R: Origin and evolution of operons and metabolic pathways. Res Microbiol. 2009, 69: 512-526.

    CAS  Google Scholar 

  7. Homma K, Fukuchi S, Gojobori T, Nishikawa K: Gene cluster analysis method identifies horizontally transferred genes withhigh reliability and indicates that they provide the moain mechanis ofoperon gain in 8 species of gamma proteobacteria. Mol Biol Evol. 2007, 24: 805-813.

    Article  PubMed  CAS  Google Scholar 

  8. Muzzi A, Moschioni M, Covacci A, Rappuoli R, Donati C: Streptococcus pneumoniae is driven by positive selection andrecombination. PLoS One. 2008, 3: e3660-10.1371/journal.pone.0003660.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Kuykendall LD, Shao JY, Hartung JS: Conservation of gene order and content in the circular chromosomes ofCandidatus Liberbacter asiaticus and otherRhizobiales. PLoS One. 2012, 74: e34673-

    Article  Google Scholar 

  10. Batut J, Andersson SGE, O’Callaghan D: The evolution of chronic infections strategies in thealpha-proteobacteria. Nat Rev Microbiol. 2004, 2: 933-945. 10.1038/nrmicro1044.

    Article  PubMed  CAS  Google Scholar 

  11. Boussau B, Karlberg EO, Frank AC, Legault B, Andersson SGE: Computational inference of scenarios for alpha-proteobacterial genomeevolution. Proc Natl Acad Sci USA. 2004, 101: 9722-9727. 10.1073/pnas.0400975101.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  12. Tian CF, Zhou YJ, Zhang YM, Li QQ, Zhang YZ, Li DF, Wang S, Wang J, Gilbert LB, Li YR: Comparative genomics of rhizobia nodulating soybean suggests extensiverecruitment of lineage-specific genes in adaptations. Proc Natl Acad Sci USA. 2012, 109: 8629-8634. 10.1073/pnas.1120436109.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  13. Wawskiewicz EJ, Barker HA: Erythritol metabolism by Propionibacterium pentosaceum. J Biol Chem. 1968, 243: 1948-1956.

    Google Scholar 

  14. Burkhardt S, Jiménez de Bagüés MP, Liautard JP, Kohler S: Analysis of the behaviour of eryC mutants of Brucella suisattenuated in macrophages. Infect Immun. 2005, 73: 6782-6790. 10.1128/IAI.73.10.6782-6790.2005.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  15. Geddes BA, Oresnik IJ: Genetic characterization of a complex locus necessary for the transport andcatabolism of erythritol, adonitol, and L-arabitol in Sinorhizobiummeliloti. Microbiology. 2012, 158 (8): 2180-2191. 10.1099/mic.0.057877-0.

    Article  PubMed  CAS  Google Scholar 

  16. Geddes BA, Pickering BS, Poysti NJ, Yudistira H, Collins H, Oresnik IJ: A locus necessary for the transport and catabolism of erythritol inSinorhizobium meliloti. Microbiol. 2010, 156: 2970-2981. 10.1099/mic.0.041905-0.

    Article  CAS  Google Scholar 

  17. Galibert F, Finan TM, Long SR, Pühler A, Abola P, Ampe F, Barloy-Hubler F, Barnett MJ, Becker A, Boistard P: The composite genome of the legume symbiont Sinorhizobiummeliloti. Science. 2001, 293: 668-672. 10.1126/science.1060966.

    Article  PubMed  CAS  Google Scholar 

  18. Altschul SF, Madden TL, Schäffer AA, Zhang JH, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database searchprograms. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  19. Markowitz VM, Chen IA, Palaniappan K, Chu K, Szeto E, Grechkin Y, Ratner A, Anderson I, Lykidis A, Mavromatis K: The integrated microbial genomes system: an expanding comparative analysisresource. Nucleic Acids Res. 2010, 38 (suppl 1): D382-D290.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  20. Yost CK, Rath AM, Noel TC, Hynes MF: Characterization of genes involved in erythritol catabolism in Rhizobiumleguminosarum bv. viciae. Microbiol. 2006, 152: 2061-2074. 10.1099/mic.0.28938-0.

    Article  CAS  Google Scholar 

  21. Sangari FJ, Agüero J, García-Lobo JM: The genes for erythritol catabolism are organized as an inducible operon inBrucella abortus. Microbiol. 2000, 146: 487-495.

    Article  CAS  Google Scholar 

  22. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The CLUSTAL-X windows interface: flexible strategies for multiple sequencealignment aided by quality analysis tools. Nucleic Acids Res. 1997, 25: 4876-4882. 10.1093/nar/25.24.4876.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  23. Simossis VA, Kleinjung J, Heringa J: Homology-extended sequence alignment. Nucleic Acids Res. 2005, 33: 816-824. 10.1093/nar/gki233.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  24. Nicholas KB, Nicholas HB, Deerfield DWII: GeneDoc: analysis and visualization of genetic variation. EMBNEW News. 1997, 4: 14-

    Google Scholar 

  25. Tamura K, Peterson D, Peterson ND, Stetcher G, Nei M, Kumar S: MEGA5: molecular evolutionary genetics analysis using maximum likelihood,evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011, 28: 2731-2739. 10.1093/molbev/msr121.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  26. Ronquist F, Huelsenbeck JP: MrBayes 3: bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19: 1572-1574. 10.1093/bioinformatics/btg180.

    Article  PubMed  CAS  Google Scholar 

  27. Felsenstein J: Confidence limits on phylogenies: an approach using the bootstrap. Evolution. 1985, 39: 783-789. 10.2307/2408678.

    Article  Google Scholar 

  28. Ronquist F: Bayesian inference of character evolution. Trends Ecol Evol. 2004, 19: 475-481. 10.1016/j.tree.2004.07.002.

    Article  PubMed  Google Scholar 

  29. Page RDM: TREEVIEW: an application to display phylogenetic trees on personalcomputers. Comput Appl Biosci. 1996, 12: 357-358.

    PubMed  CAS  Google Scholar 

  30. Reeve W, Chain P, O’Hara G, Ardley J, Nandesena K, Bräu L, Tiwari R, Malfatti S, Kiss H, Lapidus A: Complete genome sequence of the Medicago microsymbiontEnsifer (Sinorhizobium) medicae strainWSM419. Stand Genomic Sci. 2010, 2 (1): 77-86. 10.4056/sigs.43526.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Schmeisser C, Liesegang H, Krysciak D, Bakkou N, Le Quéré A, Wollherr A, Heinemeyer I, Morgenstern B, Pommerening-Röser A, Flores M: Rhizobium sp. strain NGR234 possesses a remarkable number ofsecretion systems. Appl Environ Microbiol. 2009, 75 (12): 4035-4045. 10.1128/AEM.00515-09.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  32. Kaneko T, Nakamura Y, Sato S, Asamizu E, Kato T, Sasamoto S, Watanabe A, Idesawa K, Ishikawa A, Kawashima K: Complete genome structure of the nitrogen-fixing symbiotic bacteriumMesorhizobium loti. DNA Res. 2000, 7: 331-338. 10.1093/dnares/7.6.331.

    Article  PubMed  CAS  Google Scholar 

  33. Giraud E, Moulin L, Vallenet D, Barbe V, Cytryn E, Avarre JC, Jaubert M, Simon D, Cartieaux F, Prin Y: Legumes symbioses: absence of Nod genes in photosynthetic bradyrhizobia. Science. 2007, 316 (5829): 1307-1312. 10.1126/science.1139548.

    Article  PubMed  Google Scholar 

  34. Slater SC, Goldman BS, Goodner B, Setubal JC, Farrand SK, Nester EW, Burr TJ, Banta L, Dickerman AW, Paulsen I: Genome sequences of three agrobacterium biovars help elucidate the evolutionof multichromosome genomes in bacteria. J Bacteriol. 2009, 191 (8): 2501-2511. 10.1128/JB.01779-08.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  35. Chain PS, Lang DM, Comerci DJ, Malfatti SA, Vergez LM, Shin M, Ugalde RA, Garcia E, Tolmasky ME: Genome of Ochrobactrum anthropi ATCC 49188 T, a versatileopportunistic pathogen and symbiont of several eukaryotic hosts. J Bacteriol. 2011, 193 (16): 4274-4275. 10.1128/JB.05335-11.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  36. Tae H, Shallom S, Settlage R, Preston D, Adams LG, Garner HR: Revised genome sequence of brucella suis 1330. J Bacteriol. 2011, 193 (22): 6410-10.1128/JB.06181-11.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  37. DelVecchio VG, Kapatral V, Redkar RJ, Patra G, Mujer C, Los T, Ivanova N, Anderson I, Bhattacharyya A, Lykidis A: The genome sequence of the facultative intracellular pathogen Brucellamelitensis. Proc Natl Acad Sci USA. 2002, 99 (1): 443-448. 10.1073/pnas.221575398.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  38. Swingley WD, Sadekar S, Mastrian SD, Matthies HJ, Hao J, Ramos H, Acharya CR, Conrad AL, Taylor HL, Dejesa LC: The complete genome sequence of Roseobacter denitrificans reveals amixotrophic rather than photosynthetic metabolism. J Bacteriol. 2007, 189 (3): 683-690. 10.1128/JB.01390-06.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  39. Kalhoefer D, Thole S, Voget S, Lehmann R, Liesegang H, Wollher A, Daniel R, Simon M, Brinkhoff T: Comparative genome analysis and genome-guided physiological analysis ofRoseobacter litoralis. BMC Genomics. 2011, 12 (1): 324-10.1186/1471-2164-12-324.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  40. Young JPW, Crossman LC, Johnston AW, Thomson NR, Ghazoui ZF, Hull KH, Wexler M, Curson ARJ, Todd JD, Poole PS: The genome of Rhizobium leguminosarum has recognizable core andaccessory components. Genome Biol. 2006, 7: R34-10.1186/gb-2006-7-4-r34.

    Article  PubMed  PubMed Central  Google Scholar 

  41. Reeve W, O’Hara G, Chain P, Ardley J, Brau L, Nandesena K, Tiwari R, Copeland A, Nolan M, Han C: Complete genome sequence of Rhizobium leguminosarum bv.trifolii strain WSM1325, an effective microsymbiont of annualMediterranean clovers. Stand Genomic Sci. 2010, 2 (3): 347-356. 10.4056/sigs.852027.

    Article  PubMed  PubMed Central  Google Scholar 

  42. Crossman LC, Castillo-Ramírez S, McAnnula C, Lozano L, Vernikos GS, Acosta JL, Ghazoui ZF, Hernández-Lucas I, Meakin G, Walker AW: A common genomic framework for a diverse assembly of plasmids in thesymbiotic nitrogen fixing bacteria. PLoS One. 2008, 3: e2567-10.1371/journal.pone.0002567.

    Article  PubMed  PubMed Central  Google Scholar 

  43. Koonin EV: Orthologs, paralogs, and evolutionary genomics. Annu Rev Genet. 2005, 39: 309-338. 10.1146/annurev.genet.39.073003.114725.

    Article  PubMed  CAS  Google Scholar 

  44. Lawrence JG, Roth JR: Selfish operons: horizontal transfer may drive the evolution of geneclusters. Genetics. 1996, 143: 1843-1860.

    PubMed  CAS  PubMed Central  Google Scholar 

  45. Treangen TJ, Rocha EPC: Horizontal transfer, not duplication, drives the expansion of proteinfamilies in prokaryotes. PLoS Genet. 2011, 7: e1001284-10.1371/journal.pgen.1001284.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

Download references


This work was funded by NSERC Discovery Grants to IJO and GH. BAG was funded byan NSERC CGS-D. The authors would like to thank the anonymous reviewer’ssuggestions that greatly improved the manuscript.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Ivan J Oresnik.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contribution

BAG collected the data set, performed the analysis and contributed to writing of themanuscript. GH provided advice and assistance with the analysis as well ascontributed to the writing of the manuscript. IJO provided advice for the analysisand contributed to the writing of the manuscript. All authors read and approved thefinal manuscript.

Electronic supplementary material


Additional file 1: Figure S1: EryA phylogenetic tree was constructed using ML and Bayesian analysis.Support for each clade is expressed as a percentage (Bayesian / ML, ie.posterior probability and bootstrap values respectively) adjacent to thenodes that supports the monophyly of various clades. The branch lengthsare based on ML analysis and are proportional to the number ofsubstitutions per site. This phylogenetic tree was used in the mirrortree in Figure 2 without branch lengths due to spacerestrictions. (EPS 1 MB)

Authors’ original submitted files for images

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Geddes, B.A., Hausner, G. & Oresnik, I.J. Phylogenetic analysis of erythritol catabolic loci within theRhizobiales and Proteobacteria. BMC Microbiol 13, 46 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: