Housekeeping genes essential for pantothenate biosynthesis are plasmid-encoded in Rhizobium etli and Rhizobium leguminosarum

Background A traditional concept in bacterial genetics states that housekeeping genes, those involved in basic metabolic functions needed for maintenance of the cell, are encoded in the chromosome, whereas genes required for dealing with challenging environmental conditions are located in plasmids. Exceptions to this rule have emerged from genomic sequence data of bacteria with multipartite genomes. The genome sequence of R. etli CFN42 predicts the presence of panC and panB genes clustered together on the 642 kb plasmid p42f and a second copy of panB on plasmid p42e. They encode putative pantothenate biosynthesis enzymes (pantoate-β-alanine ligase and 3-methyl-2-oxobutanoate hydroxymethyltransferase, respectively). Due to their ubiquitous distribution and relevance in the central metabolism of the cell, these genes are considered part of the core genome; thus, their occurrence in a plasmid is noteworthy. In this study we investigate the contribution of these genes to pantothenate biosynthesis, examine whether their presence in plasmids is a prevalent characteristic of the Rhizobiales with multipartite genomes, and assess the possibility that the panCB genes may have reached plasmids by horizontal gene transfer. Results Analysis of mutants confirmed that the panC and panB genes located on plasmid p42f are indispensable for the synthesis of pantothenate. A screening of the location of panCB genes among members of the Rhizobiales showed that only R. etli and R. leguminosarum strains carry panCB genes in plasmids. The panCB phylogeny attested a common origin for chromosomal and plasmid-borne panCB sequences, suggesting that the R. etli and R. leguminosarum panCB genes are orthologs rather than xenologs. The panCB genes could not totally restore the ability of a strain cured of plasmid p42f to grow in minimal medium. Conclusions This study shows experimental evidence that core panCB genes located in plasmids of R. etli and R. leguminosarum are indispensable for the synthesis of pantothenate. The unusual presence of panCB genes in plasmids of Rhizobiales may be due to an intragenomic transfer from chromosome to plasmid. Plasmid p42f encodes other functions required for growth in minimal medium. Our results support the hypothesis of cooperation among different replicons for basic cellular functions in multipartite rhizobia genomes.


Background
Multipartite genomes are common among members of the α-proteobacteria [1]. Most symbiotic nitrogen-fixing bacteria belonging to the genera Rhizobium, Sinorhizobium, Mesorhizobium and Bradyrhizobium possess multipartite genomes organized as a single circular chromosome and a variable number of large plasmids [2]. In some species plasmids can represent, in terms of size, up to 40% of the total genome. In Rhizobium and Sinorhizobium species one plasmid (pSym) concentrates most of the genes required for nodulation and nitrogen fixation [3]. The complete genome sequences of different rhizobia have revealed that plasmids harbor mainly accessory genes and that most encode predicted transport systems and a variety of catabolic pathways that may contribute to the adaptation of rhizobia to the heterogeneous soil and nodule environments [2,4]. These genes are absent from closely related genomes, lack synteny and their G+C composition differs from that of the core genes. The core genes are mainly located on chromosomes, have essential functions in cell maintenance and have orthologs in related species [5,6]. In spite of this evidently biased distribution of core genes in the chromosome and accessory genes in plasmids, it is important to highlight the fact that there are interesting exceptions to this genomic rule: several typical core genes have been found encoded on rhizobia plasmids. Some are copies of genes located on chromosomes, with redundant functions that are totally dispensable for normal growth. Examples of these genes are the multiple copies of chaperonin-encoding genes groEL/groES [7,8], two tpiA genes encoding putative triose phosphate isomerase, a key enzyme of central carbon metabolism [4,6,9], and two putative S. meliloti asparagine synthetases (asnB and asnO), which may have a role in asparagine synthesis from aspartate by ATP-dependent amidation [10]. In contrast to these reiterated genes, a few single copy core genes have also been localized in plasmids. The tRNA specific for the second most frequently used arginine codon, CCG, is located on pSymB in S. melioti [10]. Since this gene lies within a region of pSymB that could not be deleted [11], it is assumed to be essential for cell viability. The single copy of the minCDE genes, conceivably involved in proper cell division, have also been found in plasmids of S. meliloti, R. leguminosarum and R. etli [4,6,10]. Studies in S. meliloti have demonstrated that even though these genes are expressed in free-living cells and within nodules they are nonessential for cell division, since their deletion did not produce the small chromosomeless minicells observed in E. coli and Bacillu subtilis [12].
A recent bioinformatic study revealed that approximately ten percent of the 897 complete bacterial genomes available in 2009 carry some core genes on extrachromosomal replicons [13]. However, very few of these genes have been functionally characterized and so their real contribution to bacterial metabolism is still an open question.
The complete genome sequence of R. etli CFN42 predicts that two putative "housekeeping" genes, panC and panB, which may be involved in pantothenate biosynthesis, are clustered together on plasmid p42f. Pantothenate is an essential precursor of coenzyme A (CoA), a key molecule in many metabolic reactions including the synthesis of phospholipids, synthesis and degradation of fatty acids, and the operation of the tricarboxylic acid cycle [14]. The R. etli panC gene is predicted to encode the sole pantoate-β-alanine ligase (PBAL), also known as pantothenate synthetase (PS) (EC 6.3.2.1), present in the R. etli genome. The function of this enzyme is the ATP-dependent condensation of D-pantoate with βalanine to form pantothenate, the last step of the pantothenate biosynthesis pathway. The panB gene encodes the putative 3-methyl-2-oxobutanoate hydroxymethyltransferase (MOHMT) (EC 2.1.2.11), also known as ketopantoate hydroxymethyltransferase (KPHMT), the first enzyme of the pathway, responsible for the formation of α-ketopantoate by the transfer of a methyl group from 5,10-methylentetrahydrofolate to alphaketoisovalerate. The complete genome sequence of R. etli CFN42 predicts that a second putative MOHMT enzyme (RHE_PE00443), similar to the product of panB, is encoded on plasmid p42e.
In this work we describe the isolation and use of panC and panB mutants to analyze the involvement of these plasmid-encoded genes in pantothenate biosynthesis. A survey of the localization of panCB genes among members of the Rhizobiales with multipartite genomes allowed us to infer a panCB phylogeny and to establish the probable chromosomal origin of these plasmidborne genes. We also report that the panCB genes could not totally restore the growth in minimal medium (MM) of a strain cured of plasmid p42f, suggesting that other functions essential for growth in MM are encoded in this plasmid.

Results
Functional characterization of plasmid p42f encoded panCB genes The predicted function of the product of panC (RHE_PF00001) annotated as PBAL, is the catalysis of the last step of pantothenate synthesis. This PBAL (298 amino acids) showed 43% identity and 62% similarity over 279 amino acids with the functionally characterized PBAL of E. coli K12 (284 amino acids). A search for conserved domains (CD-search) at NCBI-CDD revealed the presence of a typical pantoate-binding site. The panB gene (RHE_PF00002) is located immediately downstream of panC. The four nucleotide overlap between the panC TGA codon and panB ATG codon suggest that these genes might be transcribed as an operon. The panB gene encodes a putative MOHMT, the first enzyme of the pantothenate pathway. A BlastP comparison between the functionally characterized MOHMT of E. coli K12 (264 amino acids) and the putative MOHMT encoded on plasmid p42f of R. etli CFN42 (273 amino acids) showed 37% identity and 56% similarity over a length of 240 amino acids. A CD-search indicated that in the putative MOHMT of R. etli CFN42 the magnesium binding and active site domains are conserved. Additionally, Paralog Search (KEGG SSDB) and pathway tools predicted a second probable MOHMT, encoded on plasmid p42e (locus tag RHE_PE00443). Both proteins are similar in length (273 and 270 aa for the products encoded by panB and RHE_PE00443, respectively). However, a BlastP comparison of these sequences showed only 36% identity and 56% similarity over a tract of 140 amino acids. A CD-search revealed that only 5 of 12 of the invariable residues present in the active site domain are conserved in RHE_PE00443. The metal binding domain could not be detected by the CD-search. To determine whether the panC and panB genes located on plasmid p42f are required for pantothenate synthesis, mutations in these genes were generated by site-directed vector integration mutagenesis via a single cross-over recombination (see details in Material and Methods and Table 1). Mutants ReTV1 (panC -) and ReTV2 (panB -) were unable to grow in minimal medium (MM) lacking calcium pantothenate (Figure 1a). Supplementation of MM with 1 μM calcium pantothenate allowed the panC and panB mutants to recover their wildtype growth rate (Figure 1b). The pantothenate auxotrophy displayed by the panB mutant ReTV2 allowed us to discard a functional role of the putative MOHMT encoded by RHE_PE00443 in pantothenate biosynthesis. Moreover, a pBBRMCS3 clone constitutively expressing RHE_PE00443 (pTV7) was unable to complement the pantothenate auxotrophy of the panB mutant (data not shown).
Plasmid pTV4, harboring the panC and panB genes, as well as plasmids pTV5 and pTV6, carrying only panC or panB respectively, were introduced into mutant strains ReTV1 and ReTV2 and the growth phenotype of each construction was evaluated in MM. The panC mutant ReTV1 complemented with the panCB genes (ReTV1-4) recovered wild type growth in MM. In contrast, when complemented only with panC (strain ReTV1-5) no growth occurred in the absence of pantothenate. These results strongly suggest that the panCB genes form a single transcriptional unit. As expected, wild type growth of panB mutant ReTV2 was recovered by complementation with the panCB genes or with the panB gene (strains ReTV2-4 and ReTV2-6 respectively).
The occurrence of panCB genes in plasmids is highly conserved among R. etli and R. leguminosarum strains but not in other members of the Rhizobiales with multipartite genomes To investigate whether the presence of the panCB genes in plasmids is a common characteristic of the Rhizobiales, we examined the location of panCB genes in 22 members of the Rhizobiales having fully sequenced multipartite genomes ( Table 2). To date, the genomes of seven R. etli strains, in addition to CFN42, have been totally sequenced [15]. However, with the exception of strain CIAT 652, the genomes were released as draft assemblies, precluding panCB localization. We experimentally determined the localization of panCB genes in the genome of four of these R. etli strains (CIAT 894, Kim5, 8C-3, and IE4771) by hybridization of their plasmid profiles with [ 32 P]dCTP-labelled panC and panB genes from CFN42 under high stringency conditions. Both probes produced intense hybridization signals on the same plasmid of each strain, indicating that the panCB genes are also plasmid-borne in these R. etli strains (Table 2). Coincidentally, in the three R. leguminosarum strains with fully sequenced genomes reported in the NCBI database, the panCB genes are assigned to plasmids. In contrast, in other species of Rhizobiales with multipartite genomes, the panCB genes are always confined to the chromosome, or to chromosome I in those species harboring two chromosomes, with exception of Agrobacterium tumefaciens C58 which carries panCB on the linear chromosome II and Methylobacterium nodulans ORS2060 that carries panC on their single chromosome and panB on plasmid pMNOD02 ( Table 2).

Phylogenetic analysis of rhizobial panCB genes indicates a common origin of chromosomal and plasmid-borne sequences
Two possible hypotheses were considered to explain the presence of panCB genes in plasmids of R. etli and R. leguminosarum strains: (1) an intragenomic rearrangement of panCB genes from chromosome to plasmid, which must have occurred in the last common ancestor of both species; (2) by xenologous gene displacement, that is, a horizontal transfer event in which a gene is displaced by a horizontally transferred ortholog acquired from another lineage [16]. In the latter hypothesis we assume that the presence of these xenolog genes in plasmids conferred a selective advantage that may have eventually led to the loss of the chromosome-located panCB genes. To test these hypotheses the phylogeny of 16 rhizobial species inferred from ten orthologous single copy housekeeping genes (fusA, guaA, ileS, infB, recA, rplB, rpoB, rpoC, secY and valS) located on primary chromosomes, was compared with the phylogeny of the same rhizobial species inferred from the panCB genes located on plasmids and chromosomes. The rationale for this comparison was that if the plasmid-borne  panCB phylogeny agrees with the current phylogeny of the Rhizobiales, inferred from the housekeeping genes, it would support the hypothesis of intragenomic transfer of the panCB genes. On the other hand, if both phylogenies are incongruent, it would favor the hypothesis of horizontal transfer of the panCB genes. Concatenated nucleic acids multiple alignments were used to infer both phylogenies with the maximum likelihood method described in materials and methods. The resulting phylogenetic trees are shown in Figure 2. The housekeeping genes inferred tree (Figure 2a) was consistent with the recently reported phylogeny of 19 Rhizobiales performed on a data set of 507 homologous proteins from the primary chromosome [17]. Both trees are in close agreement with the phylogeny inferred from the panCB genes ( Figure 2b). Thus the phylogeny of R. etli and R. leguminosarum inferred from plasmid-encoded panCB genes is consistent with the phylogeny deduced from their housekeeping genes supporting the hypothesis of a chromosomal origin for the plasmid-encoded panCB genes.
The panCB genes do not fully complement the growth deficiency of a R. etli CFN42 p42f cured derivative in MM It was reported previously that R. etli CFNX186, a p42fcured derivative of R. etli CFN42, is unable to grow in MM [18]. To assess if the growth deficiency of strain CFNX186 in MM was due to the absence of the panC and panB genes, plasmid pTV4 (panCB) was introduced into strain CFNX186. The growth of the transconjugant (CFNX186-4) after 15 hours of culture in MM was only 50% that of the WT strain grown under the same conditions (Figure 3a). The growth of CFNX186-4 did not improve even after 72 h in culture (data not shown). Interestingly, strain CFNX186-4 had the same growth rate as strain CFNX186 cultured in MM supplemented Abbreviations are as follows: Chr, chromosome of those Rhizobiales with one chromosome; Chr I and Chr II, chromosome I and chromosome II respectively in those Rhizobiales harboring two chromosomes; p, plasmid. *Rhizobium species in which localization of panCB genes was done by Southern blot hybridization of plasmid profiles. † Plasmids with very similar electrophoretic mobility gave as result ambiguous plasmid localization of panC and panB homologous sequences.
with 1 μM calcium pantothenate (Figure 3b). This shows that the growth deficiency of CFNX186 is only partly due to the absence of the panCB genes and indicates that other functions encoded in plasmid p42f are required for growth in MM.
Previous studies have demonstrated that the katG gene, which encodes the sole catalase-peroxidase expressed in free-living growth conditions, is located on plasmid p42f of R. etli CFN42. These studies also revealed that the growth rate of a katG mutant in MM was significantly reduced in comparison with that of the wild-type parental strain [19]. On plasmid p42f katG, as well as its putative transcriptional regulator protein encoded by oxyR, are located 80 bp downstream of the panCB genes. We speculated that introduction of the panCB genes together with the katG and oxyR genes might improve the growth of CFNX186 in MM. To test this hypothesis, we used pCos24, which contains a 20 kb fragment of p42f carrying panCB, katG and oxyR (see Material and Methods). pCos24 was introduced into CFNX186 and the resulting transconjugant (CFNX186-24) grown in MM. Figure 3 shows that after 15 hours of culture there was no significant difference between the growth rate of CFNX186 complemented only with panCB (CFNX183-4), and CFNX186 complemented with cosmid pCos24 (CFNX186-24). Furthermore, the   growth of CFNX186-24 did not increase even after 72 h of culture (data not shown) indicating that katG and oxyR did not improve the growth rate of panCB complemented CFNX186 in MM. We also tested the possibility that arginine might improve the growth of strain CFNX186-24 due to the presence of a putative N-acetylornithinase (EC 3.5.1.16) encoded in the plasmid p42f. In the Enterobactericeae this enzyme catalyzes the conversion of N-acetylornithine to ornithine, a key step in the arginine biosynthesis pathway [20]. However, the growth deficiency of strain CFN186-24 in MM was not corrected by the addition of 1, 5, 10 or 15 mM arginine (data not shown). Furthemore, we constructed an argE mutant strain (ReTV3, Table 1) that was able to grow in MM without exogenous arginine at the same rate as parental strain CFN42 (data not shown), confirming that this gene is not essential for arginine synthesis.

Discussion
Seminal studies on the phenotypic characterisation of plasmid-cured strains of R. leguminosarum and R. etli revealed that the absence of several plasmids cause a growth deficiency in rich and minimal medium [18,21]. These findings suggested that undefined metabolic traits are present on rhizobial plasmids. The bioinformatic analysis of 897 bacterial genomes performed by Harrison et al [13] revealed the presence of extrachromosomal core genes in 82 genomes mainly belonging to the Proteobacteria. In contrast with these in silico data, there is little experimental information on the contribution of these core genes to bacterial metabolism or cellular process. The few genes that have been functionally characterized encode redundant functions and are totally dispensable for the cell [7][8][9]12]. Our study provides experimental evidence that the enzymes MOHMT (EC 2.1.2.11) and PBAL (EC 6.3.2.1) encoded on plasmid p42f are indispensable for the synthesis of pantothenate. Moreover, our results showed that the cluster of panCB, katG and oxyR genes was insufficient to restore full growth capacity to the p42f cured derivative CFNX186, implying that in addition to pantothenate synthesis, there are more functions encoded on plasmid p42f required for growth in MM. Obvious candidates for these functions could not be identified a priori among the 567 proteins encoded in p42f even though their predicted functions were recently updated with KAAS (KEGG Automatic Annotation Server and Pathway Reconstruction Server). We discarded arginine limitation as the cause for the growth deficiency of strain CFNX186-24. The arginine prototrophy displayed by a mutation in the p42f encoded argE suggests that in R. etli the conversion of N-acetylornithine to ornithine is catalyzed by the chromosome-encoded ArgJ, an ornithine acetyltransferase (OATase, EC 2.3.1.35), which transfers the acetyl group of N-acetylornithine to glutamate to produce ornithine and N-acetylglutamate. Functional OATases have been found in the majority of bacteria [20]. Also, we have demonstrated that plasmid-localization of panCB in R. etli CFN42 is not unique to this strain. A screening of the location of panCB genes among members of the Rhizobiales, showed that the occurrence of these genes in plasmids is a highly conserved trait among R. etli and R. leguminosarum strains. Furthermore, the synteny of the panCB, oxyR, katG genes in R. etli CFN42 is conserved in R. etli CIAT652 and in R. leguminosarum strains 3841, WSM1325 and WSM2304. In contrast, genomes of Rhizobium sp., Sinorhizobium, Bradyrhizobium and Mesorhizobium species carried chromosomal panCB genes. Only in A. tumefaciens C58 the panCB genes are localized in the linear chromosome, whereas in all other Rhizobiales harboring secondary chromosomes the panCB genes were located in chromosome I. A bioinformatic analysis with MicrobesOnline operon predictions [22] indicates that panCB genes are organized as possible operons in most of the Rhizobiales examined in this work: all these predicted operons conserve the four nucleotide overlap between the panC TGA codon and the panB ATG codon observed in R. etli CFN42 (data not shown). In the genomes of Bradyrhizobium sp. BTAi1, Nitrobacter hamburgensis X14, Methylobacterium extorquens AM1, Methylobacterium radiotolerans JCM2831 and Xantobacter autotrophicus Ry2, panC and panB are encoded in separate chromosomal loci, whereas in Methylobacterium nodulans ORS2060 panC is located in the chromosome and panB in plasmid pMNOD02.
The Rhizobiales phylogeny inferred from concatenated panC and panB genes was consistent with the phylogeny deduced from 10 concatenated housekeeping genes. The low bootstrap values obtained for some nodes of the panCB phylogeny might be due to the small number of informative characters in the alignments of only two genes (1 977 nucleotides). This is consistent with previous reports that state that trees from longer alignments obtained by the concatenation of genes encoding multiple-protein families have higher bootstrap support than trees inferred from genes encoding single proteins [23]. The phylogenetic relationships among Rhizobium species carrying panCB genes in plasmids with their closest relatives, Agrobacterium and Sinorhizobium species, harboring panCB genes in the chromosome was also observed in neighbor-joining trees inferred from single panC and panB genes (data not shown). These data agree with the hypothesis that plasmid-encoded panCB genes are orthologs of the panCB genes located in chromosome. From these results, we propose that the presence of the panCB genes in plasmids in R. etli and R. leguminosarum species may be due to an intragenomic transfer event from chromosome to plasmid. The mechanism leading to the transfer of core genes from chromosome to plasmids could involve cointegration and excision events between the replicons, similar to rearrangements that have been visualized in S. meliloti [24]. The translocation of genes from chromosome to plasmids may be part of the complex evolution of multipartite genomes. A study based on the analysis of clusters of syntenic genes shared among plasmids and secondary chromosomes of bacteria with multipartite genomes suggested that secondary chromosomes may have originated from an ancestral plasmid to which genes had been transferred from a primary chromosome [17].
Our pioneering work on plasmid-encoded functions in R. etli CFN42 established that a functional relationship among different replicons is required for symbiotic and free-living functions [18,25]. More recently, a functional connectivity among most of the proteins encoded in the replicons of R. etli CFN42 was predicted in silico [6]. Our results demonstrated that the putative MOHMT encoded by RHE_PE00443 is not functional under the conditions studied and provides evidence of functional cooperation between p42f and chromosomally encoded proteins for pantothenate biosynthesis.

Conclusions
Our study shows that the presence of the core panCB genes in a plasmid is a characteristic conserved in R. etli and R. leguminosarum strains but not in other Rhizobiales. The phylogenetic approach used in this study suggests that the unusual presence of panCB in plasmids may be due to an intragenomic transfer event from chromosome to plasmid rather than a xenologous gene displacement. Using R. etli CFN42 as a model, we showed that the plasmid-encoded core panCB genes were indispensable for the synthesis of pantothenate. The panCB genes could not totally restore growth of a strain cured of plasmid p42f in minimal medium, suggesting that other functions essential for growth in this medium are encoded in this plasmid. Our results support the hypothesis of functional cooperation among different replicons for basic cellular functions in multipartite rhizobial genomes.

Bacterial strains, media and growth conditions
The bacterial strains and plasmids used are listed in Table 1. Rhizobium strains were grown at 30°C in three different media: a) PY rich medium [26], b) Minimal medium (MM) [27] and c) Minimal medium plus 1 μM calcium pantothenate (MMP). MM was prepared as follows: a solution containing 10 mM succinate as carbon source, 10 mM NH 4 Cl as nitrogen source, 1.26 mM K 2 HPO 4 , 0.83 mM MgSO 4 , was adjusted to pH 6.8 and sterilized. After sterilization the following components were added to the final concentration indicated: 0.0184 mM FeCl 3 6H 2 O (filter sterilized), 1.49 mM CaCl 2 2H 2 O (autoclaved separately), 10 μg ml -1 biotin and 10 μg ml -1 thiamine (both filter sterilized). MMP contains the same components plus 1 μM calcium pantothenate. To determine growth rates on MM or MMP, Rhizobium strains were grown to saturation in PY medium, the cells were harvested by centrifugation, washed twice with sterile deionized water and diluted to an initial optical density of 0.05 at 600 nm (OD 600 ) when added to 30 ml of MM. These cultures were grown for 24 h in 125 ml Erlenmeyer flasks to deplete any endogenous pantothenate. Cells were then harvested and washed as described above and added to fresh MM or MMP in the same manner as for the first inoculation and cultured for 15 hours. Bacterial growth was quantified by measuring optical density at 600 nm (OD 600 ) every 3 hours. Antibiotics were used at the following concentrations (in μg ml -1 ): chloramphenicol (Cm), 30; tetracycline (Tc), 10; kanamycin (Km), 30; gentamicin (Gm), 30; spectinomycin (Sp), 100; nalidixic acid (Nal), 20. E. coli transformants harboring recombinant plasmids (β-galactosidase-positive) were identified by growth on LB plates with 30 μg ml -1 5-bromo-4chloro-3-indolyl-β-D-galactoside (X-Gal).

DNA manipulations
Standard techniques described by Sambrook et al. [28] were used for plasmid and total DNA isolation, restriction, cloning, transformations, and agarose gel electrophoresis. Plasmid mobilization from E. coli to Rhizobium was done by conjugation performed on PY plates at 30°C by using overnight cultures grown to stationary phase. Donors (E. coli strain S17-1) and recipients (R. etli CFN42 wild type and mutant strains) were mixed at a 1:2 ratio, and suitable markers were used for transconjugant selection.

Mutagenesis of the panC and panB genes and genetic complementation of mutant strains
Mutants were generated by site-directed vector integration mutagenesis. Internal 400 bp DNA fragments of panC and panB were amplified by PCR with primers A and B; C and D, respectively (Table 3). PCR fragments of panC and panB were cloned in vector pBC as 400 bp BamHI-XbaI fragments, generating pBC1 and pBC2 respectively, and then subcloned as KpnI-XbaI fragments into suicide vector pK18mob [29] to form plasmids pTV1 and pTV2, respectively. These plasmids were mobilized into R. etli CFN42 by conjugation and single crossover recombinants selected on PY plates containing Km and Nal. The disruption of the panC and panB genes was confirmed by Southern blot analysis using a 400-bp PCR internal fragment of each gene as a probe (data not shown). The resultant mutants were named ReTV1 and ReTV2 respectively. To complement the phenotype of the panC and panB mutants, plasmids pTV4, pTV5, pTV6 and pTV7 were constructed as follows: a 3.1 kb EcoRI fragment from cosmid vector pCos24, isolated from a genomic library of R. etli CFN42 [30] and containing the panC and panB genes, was subcloned in broadhost-range vector pRK7813, generating plasmid pTV4. To construct plasmid pTV5, a 1.2 kb fragment containing only panC (894 bp) was amplified by PCR with primers E and F and cloned in the KpnI-XbaI sites in the broadhost-range vector pBBRMSC3 so that the gene would be constitutively expressed from the vector's lacZ promoter. Primers G and H (Table 3) were used to amplify a 1 kb PCR fragment containing only the panB gene (822 bp). This DNA fragment was cloned in plasmid pBBRMSC3 in the KpnI-XbaI restriction sites, generating plasmid pTV6. Plasmid pTV7 contains the second panB gene (RHE_PE00443), encoded on R. etli plasmid p42e, this gene was amplified with primers I and J. The resultant 1 kb PCR fragment was cloned in the KpnI-XbaI sites of plasmid pBBRMSC3. To complement the growth deficiency of strain CFNX186, a derivative of R. etli CFN42 cured of plasmid p42f, plasmid pTV4 and cosmid vector pCos24 were introduced by conjugation. The complemented strains obtained were named CFNX186-4 and CFNX186-24 respectively. The argE gene was disrupted as described above. Briefly, an internal 400 bp PCR fragment of argE amplified with primers K and L was cloned directly in pK18mob using the KpnI and XbaI sites to give pTV3 (Table 1). This recombinant suicide plasmid was mobilized into R. etli CFN42 and the resultant mutant named ReTV3 (Table 1).

Filter blots hybridization and plasmid visualization
For Southern-type hybridizations, genomic DNA was digested with appropriate restriction enzymes, electrophoresed in 1% (w/v) agarose gels, blotted onto nylon membranes, and hybridized under stringent conditions, as previously reported by [31], using Rapid-hyb buffer. To use the panC and panB genes as probes, both genes were amplified by PCR, separated on a 1% agarose and purified by a PCR purification kit (QIAquick). They were labeled with [α-32P]dCTP using a Rediprime DNA labeling system. Plasmid profiles were visualized by the Eckhardt technique as modified by [21], and hybridized in a similar manner.
Identification of orthologous proteins, multiple sequence alignments and phylogenetic analysis All genomic sequences analyzed in this study were obtained from the Integrated Microbial Genomes System of the DOE Joint Genome Institute http://img.jgi.doe. gov/). We obtained protein and gene sequences of panB, panC and 10 chromosomal housekeeping genes (fusA, guaA, ileS, infB, recA, rplB, rpoB, rpoC, secY and valS) from 16 rhizobial species. Accession numbers for these sequences and the species list are shown in Table S1 (see Additional file 1). An orthologous data set for each gene was constructed using Blast [32] and the bidirectional best hit method applying the criteria reported by Poggio et al [33]. Multiple alignments of putative orthologous proteins were performed using the MUSCLE program [34] with default settings. After removing poorly conserved regions two concatenated protein alignments were obtained, one for the 10 chromosomal housekeeping genes (8469 amino acids) and the other for panB and panC (659 amino acids). Both concatenated protein multiple alignments were used to generate nucleic acids multiple alignments of their respective genes with the Tranalign program of the EMBOSS suit http://emboss. sourceforge.net/. Nucleic acids multiple alignments were used to obtain two phylogenies with the maximum likelihood method implemented in PHYML [35] with HKY as substitution model [36]. The phylogenetic reconstruction was carried out with a nonparametric bootstrap analysis of 100 replicates for each alignment. TreeDyn program [37] was used to visualize and edit both phylogenies.

Additional material
Additional file 1: Table S1. Rhizobial species list and accession numbers of housekeeping and panCB genes used for phylogenetic analysis.