Intragenomic diversity of Rhizobium leguminosarum bv. trifolii clover nodule isolates

Background Soil bacteria from the genus Rhizobium are characterized by a complex genomic architecture comprising chromosome and large plasmids. Genes responsible for symbiotic interactions with legumes are usually located on one of the plasmids, named the symbiotic plasmid (pSym). The plasmids have a great impact not only on the metabolic potential of rhizobia but also underlie genome rearrangements and plasticity. Results Here, we analyzed the distribution and sequence variability of markers located on chromosomes and extrachromosomal replicons of Rhizobium leguminosarum bv. trifolii strains originating from nodules of clover grown in the same site in cultivated soil. First, on the basis of sequence similarity of repA and repC replication genes to the respective counterparts of chromids reported in R. leguminosarum bv. viciae 3841 and R. etli CFN42, chromid-like replicons were distinguished from the pool of plasmids of the nodule isolates studied. Next, variability of the gene content was analyzed in the different genome compartments, i.e., the chromosome, chromid-like and 'other plasmids'. The stable and unstable chromosomal and plasmid genes were detected on the basis of hybridization data. Displacement of a few unstable genes between the chromosome, chromid-like and 'other plasmids', as well as loss of some markers was observed in the sampled strains. Analyses of chosen gene sequences allowed estimation of the degree of their adaptation to the three genome compartments as well as to the host. Conclusions Our results showed that differences in distribution and sequence divergence of plasmid and chromosomal genes can be detected even within a small group of clover nodule isolates recovered from clovers grown at the same site. Substantial divergence of genome organization could be detected especially taking into account the content of extrachromosomal DNA. Despite the high variability concerning the number and size of plasmids among the studied strains, conservation of the location as well as dynamic distribution of the individual genes (especially replication genes) of a particular genome compartment were demonstrated. The sequence divergence of particular genes may be affected by their location in the given genome compartment. The 'other plasmid' genes are less adapted to the host genome than the chromosome and chromid-like genes.

Results: Here, we analyzed the distribution and sequence variability of markers located on chromosomes and extrachromosomal replicons of Rhizobium leguminosarum bv. trifolii strains originating from nodules of clover grown in the same site in cultivated soil. First, on the basis of sequence similarity of repA and repC replication genes to the respective counterparts of chromids reported in R. leguminosarum bv. viciae 3841 and R. etli CFN42, chromid-like replicons were distinguished from the pool of plasmids of the nodule isolates studied. Next, variability of the gene content was analyzed in the different genome compartments, i.e., the chromosome, chromid-like and 'other plasmids'. The stable and unstable chromosomal and plasmid genes were detected on the basis of hybridization data. Displacement of a few unstable genes between the chromosome, chromid-like and 'other plasmids', as well as loss of some markers was observed in the sampled strains. Analyses of chosen gene sequences allowed estimation of the degree of their adaptation to the three genome compartments as well as to the host.

Conclusions:
Our results showed that differences in distribution and sequence divergence of plasmid and chromosomal genes can be detected even within a small group of clover nodule isolates recovered from clovers grown at the same site. Substantial divergence of genome organization could be detected especially taking into account the content of extrachromosomal DNA. Despite the high variability concerning the number and size of plasmids among the studied strains, conservation of the location as well as dynamic distribution of the individual genes (especially replication genes) of a particular genome compartment were demonstrated. The sequence divergence of particular genes may be affected by their location in the given genome compartment. The 'other plasmid' genes are less adapted to the host genome than the chromosome and chromid-like genes.

Background
Rhizobia are widely occurring soil bacteria that are able to establish nitrogen-fixing symbioses with legumes. Bacterium-plant interaction is a complex process in which specific plant and bacterial signals are exchanged resulting in formation of nodules, where rhizobia in the form of bacteroids fix nitrogen [1][2][3].
Rhizobial genomes are large and multipartite, composed of a single circular chromosome and a set of large plasmids [4][5][6]. The genes responsible for nodulation (nod) and nitrogen-fixation (nif-fix) are either carried by large plasmids (pSym) or are incorporated in the chromosome as symbiotic islands [7,8]. Large genomes of Rhizobiaceae and Bradyrhizobiaceae (above 6-9 Mb) are considered more ecologically advantageous in an environment that is scarce in nutrients but diverse as regards carbon and energy sources. These genomes are disproportionately enriched in regulation and transport genes and in genes involved in secondary metabolism in comparison with medium-and small-size genome containing bacteria [9].
"Core" and "accessory" components of Rhizobium genomes can be distinguished. Chromosomes with conserved gene content and order (synteny) are considered as core. Accordingly, plasmids constitute the accessory genome. Plasmids are more flexible than the chromosomes, as defined by more frequent gene gains and losses, even in the same species. They are heterogeneous in size and gene content and lack synteny even in closely related species, except for genes involved in plasmid replication and symbiotic properties [6,10,11]. In some species, such as Rhizobium leguminosarum, plasmids may comprise up to 35% of the total genome [6,7].
Rhizobial plasmids are maintained in the cells via repABC cassettes, comprising genes required for active segregation (repAB) and initiation of replication (repC) [12]. The presence of several repABC operons within a single genome, which are subjected to individual selection pressure and divergence, could be the key element of the existence of different plasmid incompatibility groups in cells and could drive the rearrangement of gene organization and of their functions [11,[13][14][15]. It was proposed that repABC plasmids coexisting in the same strain most probably emerged by separate events of lateral transfer, which required evolution of different incompatibility groups allowing simultaneous residence of plasmids equipped with a similar replication/partition system in a single bacterial species [12]. Thus, the degree of divergence of the plasmid replication apparatus, whose sequence is subject to strong evolutionary pressure and determines the ability to evade incompatibility between plasmids [13], and horizontal gene transfers are potential forces that shaped rhizobial genomes.
Recently, some (not only rhizobial) extrachromosomal replicons that have properties distinct from both chromosome and plasmids were reported and named "chromids" [16]. Chromids are characterized by presence of some important genes essential for growth under all conditions, with nucleotide composition and codon usage similar to the chromosome of the parental strain, and, by contrast, plasmid replication and partition systems [16].
Furthermore, recent analyses of Rhizobium etli strains [11] showed that this species has a pangenomic structure. By definition, a pangenome "determines the core genome, which consists of genes shared by all the strains studied and probably encoding functions related to the basic biology and phenotypes of the species" [17]. The basis of the pangenome concept emerged from an observation that each newly sequenced genome enriched the pool of species-specific genes with new ones [17,18]. This makes it possible to detect, besides the core genomes, the dispensable genomes composed of both chromosomal and plasmid genes, present only in some of the strains, which contribute to the species diversity and allow adaptation to new ecological niches and a specific environment. Despite the overall genomic divergence, R. etli pangenome comprises a core genome composed of both chromosomal and plasmid sequences, as well as highly conserved symbiosis-related genes on the pSym plasmid. The unusual variability observed in rhizobial genomes may further result from several types of alterations, such as point mutations, deletions, amplification of DNA, and from intragenome re-assortment of sequences [19][20][21].
The aim of this study was to evaluate the divergence of genomes of a small population of R. leguminosarum bv. trifolii (Rlt) nodule isolates from clover plants grown in the same site in cultivated soil. Like the other members of the genus Rhizobium, the Rlt genomes were partitioned into the chromosome and several large plasmids, one of which carried symbiosis-related genes. The variability of the genome architecture involved not only the number and size of the plasmids, but also the location of specific genes on the particular replicons. Distribution of repABC operon markers and other genes in the three genome compartments: the chromosome, chromid-like and 'other plasmids' was assessed. We found "stable" genes that were permanently located in a specific genome compartment, as well as "unstable" ones, which were detected in different replicons of the sampled strains. Sequences of selected chromosome and plasmid genes were subjected to an assessment of adaptation to a particular genome compartment by analyses of codon usage and codon adaptation index. A potential evolutionary pathway of Rlt strains was proposed on the basis of gene sequences and their distribution.

Methods
R. leguminosarum bv. trifolii (Rlt) strains 129 R. leguminosarum isolates were obtained from nodules of red clover (Trifolium pratense L. cv. Dajana) growing in sandy loam (N:P:K 0.157:0.014:0.013%). Plants were grown on 1 m 2 plot for six weeks between May and June 2008. Afterwards, ten randomly chosen clover plants growing in each other's vicinity were harvested, the nodules were collected, surface-sterilized, crushed and their content plated on 79CA medium [22]. Strains isolated from the nodules were purified by successive streaking of single colonies and pure cultures were used in further experiments.

DNA methods
Standard techniques were used for labeling of DNA, Southern hybridization and agarose gel electrophoresis [23]. DNA probes for Southern hybridizations were obtained by PCR amplification with RtTA1 genomic Table 1 Primers and probes used in this study    DNA as template and appropriate primers ( Table 1). The probes were labeled with non-radioactive DIG DNA Labeling and Detection Kit (Roche). Southern blotting, gel pretreatment and capillary transfers were done using standard procedures [23]. Hybridizations were performed at high stringency at 42°C using 50% formamide in pre-hybridization and hybridization solutions. Analyses of the plasmid content of the 129 isolates were performed as described by Eckhardt [24].

Preparation of high molecular weight DNA and PFGE conditions
The plugs were formed with 5 ml 48 h culture of Rlt strains, which after centrifugation were resuspended in TE buffer and mixed with 2% LMP agarose (Sigma). Agarose embedded cells were incubated with TE and lysozyme (1.5 mg/ml) for 16 h at 37°C, and then in cell lysis buffer (1% sodium lauryl sarcosine, 50 mM EDTA, 50 mM Tris-HCl pH 8.0) supplemented with proteinase K (0.5 mg/ml) at 37°C for additional 48 h. The proteinase K was inactivated by PMSF (0.4 mg/ml) at 37°C for 1 h. Plugs were washed tree times (30 min) with TE buffer and finally stored in TE at 4°C. PFGE was performed with the contour-clamped homogenous electric field mode with the Bio-Rad system (model CHEF-DRIII). DNA samples were separated in 1% Megabase agarose gels (Bio-Rad) in 1 × TAE buffer, refrigerated at 12-14°C, with switch time 100-300 seconds, angle 106°, voltage gradient 3 V/cm for 48 h. Estimation of plasmid size was performed with BIO-PROFIL BioGene (Vilber-Lourmat, France), using R. leguminosarum bv. viciae strain 3841 [6], R. leguminosarum bv. trifolii TA1 [25,26] and Sinorhizobium meliloti 1021 [4].

Computer assisted analyses
Sequence data were analyzed with Lasergene analysis software (DNASTAR, Inc). Data base searches were done with the BLAST and FASTA programs at the National Centre for Biotechnology Information (Bethesda, Md) and European Bioinformatic Institute (Hinxton, UK). For the DNA sequences multiple alignments Clustal-W algorithm was used [27]. Codon usage of sequenced genes was calculated using ACUA [28]. Codon adaptation index (CAI) was calculated with cai program [29]. In codon usage discriminant analyses with two grouping methods were applied to studied sequences: (a) based on the localization of genes in defined part of the rhizobial genome (three groups: chromosome, chromid-like, and other plasmids), or (b) based on the origin of the genes (13 groups-each for one strain). The results of this multivariate analysis give us the information about separation of studied groups on the basis of discriminant functions i.e. linear combinations of studied variables maximizing distances between groups and orthogonal to each other [30]. For every grouping method set of variables included the relative frequency of alternative codons (for the same aminoacids), leading to the investigation of 59 variables (omitting stop codons and codons for methionine and tryptophan, which have no alternatives).
Complete discriminant analysis was performed but from among many obtained results we focused on Chi-squared test providing the number of statistically significant discriminant functions, squared Mahalanobis distances between the group centroids (taking into account the correlation between variables), scatterplots of discriminant scores i.e. cases located in the property space formed by first two discriminant functions [31] as well as the classification table containing information about the number and percent of correctly classified cases in each group.
The application of discriminant analysis was preceded by tolerance test, which enable us to remove redundant variables out of the model [32]. The tolerance tests were performed using Classify/Discriminant unit of SPSS software (SPSS for Windows version 10.0, 1999, SPSS Inc., Chicago, IL, USA) while other results were obtained using Discriminant Function Analysis units of STATIS-TICA software system (Statistica version 6, 2001, Stat-Soft Inc., Tulsa, OK, USA).

Nucleotide sequence accession numbers
The following GenBank accession numbers were given to the nucleotide sequences determined in this study. For

Results
Strain selection based on variable genomic organization A group of 23 isolates was selected from among a collection of 129 R. leguminosarum bv. trifolii (Rlt) isolates recovered from nodules of ten clover plants grown in the vicinity of each other in cultivated soil. The main criterion of strain selection, beside the ability of effective nodulation of clover (Trifolium pratense), was their different plasmid pattern obtained by Eckhardt's lysis procedure ( Figure 1A). The strains harbored from 3 to 6 plasmids whose size, as assessed by PFGE analysis of high molecular weight (HMW) genomic DNA, ranged approximately from 150 kb to 1380 kb (Table 2, Figure 1B). The plasmids will be referred to as pRlea to pRlef throughout this report. The isolates that differed in the plasmid pattern were assumed to be distinct strains. In all the strains studied, the single symbiotic plasmid (pSym), with average molecular weight of 361 kb (ranging from 260 kb to 500 kb) was identified by Southern hybridization with nodA and nifNE probes, derived from the R. leguminosarum bv. trifolii TA1 (RtTA1) laboratory strain [26]. A set of 24 strains (including RtTA1) with a highly variable number and size of plasmids was chosen for further hybridization assays. Noteworthy is the presence of very large plasmids with molecular weight above 1.0 Mb, identified in a majority of the sampled strains ( Figure 1). Average molecular weight (m.w.) of all the plasmids in each of the 23 isolates was calculated as 2.815 Mb (ranging from 1.89 to 3.25 Mb). With regard to the average genome size~7.145 Mb of recently sequenced R. leguminosarum bv. trifolii WSM2304 (Rlt2304) and WSM1325 (Rlt1325) [33,34], in which extrachromosomal replicons constitute 34% and 36%, respectively, the extrachromosomal DNA content in our strains was calculated to range from 26% to 45% (an average~39%).

Similarity of replication-partition genes in the plasmid pool of selected strains
One of the methods to assess the phylogenetic relatedness among plasmids is to compare their replication systems. Thus, at the beginning of our study, similarity and/or diversity of replication regions between the plasmids of the nodule isolates were examined. Recently, the replication systems of four plasmids (pRleTA1a-pRleTA1d), each equipped with repABC genes, were analyzed in RtTA1 [35]. An experimental approach comprising a series of Southern hybridizations with repA and repC genes derived from plasmids pRleTA1a-pRleTA1d of RtTA1 as molecular probes was used ( Table 1). The repA and repC genes were PCR amplified from the RtTA1 genome and probed against PFGE-separated HMW DNA of the sampled strains. The choice of two different genes from each of the replication system identified in RtTA1 as molecular probes seemed to be justified by lack of single universal phylogenetic history within the repABC operon and by RepA and RepB evolution, partially independent from RepC [13].
Distribution of the given rep marker was assessed with regard to its location in one of the extrachromosomal replicons of the tested strains. repA and repC genes of the largest pRleTA1d were jointly detected on the largest plasmids in all the sampled Rlt strains (Figure 2). Similarly, repA and repC of the pRleTA1b jointly hybridized to one of the plasmids of different size in all the Rlt strains. In contrast, repA and repC of the pRleTA1c were rarely localized together (4 of 23 strains). The repA of the pRleTA1c was not similar to any of the plasmids in most of the sampled strains, but repC hybridized frequently (19 of 23 strains) to pSym plasmids. repA and repC of pRleTA1a (pSym) commonly showed sequence similarity to non-symbiotic plasmids of the sampled strains and only exceptionally hybridized to symbiotic ones ( Figure 2).
RepABC of pRleTA1d and pRleTA1b display similarity with replication systems of the extrachromosomal replicons, which were recently described as chromids [16,35]. Within the group of closely related strains RtTA1, R. leguminosarum bv. viciae 3841 (Rlv), R. etli CFN42 (Rhe), RltWSM2304 and RltWSM1325 clusters of replicons carrying the most similar replication systems can be distinguished. They comprise pRleTA1d-pRL12-p42f-pRLG201-pR132501 and pRleTA1b-pRL11-p42e-pRLG202-pR132502, respectively. Therefore, detection of positive hybridization signals with probes derived from rep genes of RtTA1 chromid-like replicons (i.e. pRleTA1b or pRleTA1d) to any of the replicons of the sampled strains allowed regarding those as a chromidlike. Based on the similarity of replication-partition genes detected in our assays, we divided the replicons of the studied strains into three genome compartments: chromosome, chromid-like and 'other plasmids' (i.e. those replicons which gave a hybridization signal with molecular probes originating from repA and repC genes of pRleTA1a or pRleTA1c, as well as those that gave no signal with any rep probes of RtTA1 replication genes). The compartment designated 'other plasmids' also comprised pSym. Such replicon division was taken into consideration in the subsequent analyses of distribution of other markers in the studied strains.

Variability of chromosomal and plasmid marker location
In further studies, the extent of gene content diversity in the sampled nodule isolates was examined. We aimed to estimate whether, besides repA and repC displacement events, we could demonstrate changes in the location of the chromosomal and plasmid genes. The same experimental approach was used, i.e. a series of Southern hybridizations with different genes with a well-defined chromosomal or plasmid location in RtTA1 (Table 1) [36].
A majority of the studied genes (rpoH2, dnaK, dnaC, rrn, lpxQ, bioA, stbB, exoR and pssL) were located on the chromosome in all the sampled strains, showing considerable conservation of chromosomal markers (Figure 3). Exceptionally, the Pss-V region was identified on the chromosome of the K3.6, K5.4 and RtTA1 but it was missing in the other strains (Figure 3) Moreover, fixGH symbiosis-related genes, which were chromosomal in the RtTA1, K3.6, K4.15 and K5.4 strains, were located mainly in the genome compartment designated as 'other plasmids' (pSym to be exact) in the remaining strains. The variable location of fixGH genes which were found on the chromosome, pSyms and chromid-like replicons (K12.5) could be accounted for by location of these genes on the putative genomic island flanked by 18 bp repeats in R. leguminosarum and R. etli [10,37].
Southern hybridizations with probes comprising markers previously identified on different RtTA1 replicons [36], such as prc and hlyD of pRleTA1d; lpsB2, orf16-orf17-otsB, tauA and orf14 genes cluster of pRleTA1c; nadA and pssM (surface polysaccharide synthesis region Pss-III) of pRle-TA1b, were carried out. These analyses demonstrated that pRleTA1d markers were almost always jointly detected in the largest chromid-like replicons (only in K3.22 and K5.4 they are separated between distinct chromid-like replicons). pRleTA1c markers in almost all (21 out of 23) of the sampled strains were located in the genome compartment designated as 'other plasmids' (Figure 3). From among markers of pRleTA1b, nadA, minD, hutI and pcaG had always chromid-like location, while the pssM gene was located in the chromosome of 19 strains, in chromid-like replicons of four strains including RtTA1, and was absent in the genome of K3.22 strain, respectively (Figure 3).
Besides the symbiotic genes nodA and nifNE used for identification of pSym plasmids, stability of thiC and acdS (Table 1) of the pRleTA1a symbiotic plasmid (ipso facto described as markers of the 'other plasmids' pool) was examined (Figure 3). Only thiC was identified in all the strains, however, located in different genomic compartments: most frequently on the chromosome (18 of 23 strains), and in the 'other plasmids' (5 strains). The acdS gene was detected in 14 of 23 strains, in each case on pSym (Figure 3). The thiC gene, similarly to fixGHI, showed high variability in location; however, its putative mobile element location is unknown [38]. thiC was reported as plasmid located in sequenced genomes of Rlv [6], Rlt2304 [33] and Rhe [5].
As a result, genes with a stable location in specific genome compartments in all the strains, as well as unstable genes with variable, strain-dependent distribution were distinguished (Figure 4). Stable markers for each compartment of the sampled strains were established i.e. chromosomal: rpoH2, exoR, dnaK, dnaC, bioA, rrn, lpxQ, pssL and stbB; chromid-like: prc, hlyD, nadA, minD, hutI and pcaG; 'other plasmids': otsB, lpsB2 (exceptionally chromid-like in K3.6), tauA and orf14 (exceptionally chromid-like in K3.12) including nodA and nifNE symbiosis-related genes of pSym ( Figure 4). Loss of some of the examined markers was noticed, i.e. Pss-V from the chromosome, pssM from chromid-like replicons, and acdS from the 'other plasmids' (pSym). Only two of the sampled strains, i.e. K3.6 and K5.4, contained all the studied markers, while others lacked at least one of the genes.
A dendrogram demonstrating similarity of the strains was constructed with the UPGMA clustering method based on markers distribution among their different genome compartments. It showed one K3.6 strain apparently split from the others (Figure 5), and two groups of clustered strains: a small one, including RtTA1, K5.4 and K4.15, and a large one comprising the remaining strains, which was further subdivided into two smaller subgroups of strains with identical marker distribution ( Figure 5).

Sequence divergence of chromosomal and plasmid genes
To assess the overall phylogenetic similarity of the sampled strains, several genes from a subset of 12 different strains displaying divergent plasmid profiles (plus RtTA1) were partially sequenced and analyzed. The sequenced genes comprised exclusively chromosomal (dnaC, dnaK, exoR, rpoH2), chromid-like replicons (hlyD, prc, nadA), and 'other plasmid' markers (nodA, nifNE) as well as those with unstable location found in different genome compartments (fixGH, thiC, lpsB2). Afterwards, phylogenetic trees were constructed based on concatenated sequences of a distinct genome compartment, allowing description of the genetic similarity of the strains using the multilocus sequences analyses (MLSA) approach ( Figure 6). In general, a low number of nucleotide substitutions were found in the examined genes in most strains. Similar groups of clustered strains were obtained in dendrograms constructed both on the basis of concatenated chromosomal sequences ( Figure 6A), as well as concatenated chromid-like replicon genes ( Figure 6B). In both cases, a smaller group containing RtTA1, K4.15 and K3.6 strains, and a larger group consisting of the remaining strains was observed. Interestingly, K3.22 chromosomal genes split off from all remaining strains suggesting their considerable divergence ( Figure 6B). Sequence similarity within the RtTA1, K4.15 and K3.6 group is also visible on a dendrogram exclusively based on plasmid gene sequences, derived from pSym ( Figure  6C). When all the concatenated sequences (comprising genes with stable and unstable location in the genome) were used in dendrogram construction, the grouping of the strains was very similar to that obtained on the basis of stable chromosomal markers ( Figure 6A, D). In conclusion, quite a similar phylogenetic history of the studied strains was demonstrated based on both stable and unstable chromosomal, chromid-like as well as 'other plasmid' genes (despite the small number of the markers analyzed).
To further evaluate the degree of sequence differentiation between the alleles with respect to their distribution in the genome and eo ipso the rate of adaptation to the genome compartment, we performed discrimination analyses focused on alternative codon usage. Discrimination analysis was applied to 59 variables (all potential triplets except for stop and non-alternative codons Met, Trp). Genes belonging to the chromosome, chromid-like and 'other plasmids' differed substantially with respect to this parameter ( Figure 7A). Apart from the well-separated sequences belonging to the three distinct genome compartments, one can observe a subgroup localized between chromosomal and 'other plasmids' gene pools ( Figure 7A). This subgroup comprised genes thiC, fixGH, which frequently changed their location and their codon usage was not adapted to any genome compartment. Comparison of the results of gene grouping based on hybridization data and discrimination analysis demonstrated very high accordance equal to 96%. Positive hybridization was colored regarding its location in one of the following genome compartments: chromosome (red), plasmids (blue) and pSym (green); (-) indicates that given marker was not detected within a genome under applied Southern hybridization conditions. The letters a-f below the strains name indicate respective plasmids, ch-chromosome. Figure 3 Distribution of replicon specific genes in the tested Rlt nodule isolates. Southern hybridization assays were carried out with several chromosome and plasmid markers of RtTA1 as molecular probes. The position of a given markers in RtTA1 genome was shown in the left column. Positive hybridization was colored regarding its location in one of the following genome compartments of Rlt isolates: chromosome (red), chromid-like (violet), plasmids (blue) and pSym (green); (-) indicates that given marker was not detected within a genome under applied Southern hybridization conditions. The letters a-f below the strains name indicate respective plasmids, ch-chromosome.
The discrimination analysis of codon usage performed on individual strains harboring the set of the tested genes (13 groups of sequences) revealed only minor differences between the resultant groups and almost no accordance (31%) with the grouping performed on the basis of hybridization. However, some level of similarity between the strains can be demonstrated. As a consequence, one more discrimination analysis of codon usage was done, and the strains were divided into three groups: (i) K3.22, (ii) RtTA1, K3.6, K4.15 and (iii) all the remaining strains ( Figure 7B). This resulted in 92% accordance between codon usage-based and straindependent grouping of sequences ( Figure 7B and Figure  6D). It was concluded that codon usage was not significantly influenced by the individual strains but may be characteristic for the group of strains.
Finally, the Codon Adaptation Index (CAI) of the sequences studied was calculated. The CAI can be used to "evaluate the extent to which selection has been effective in molding the pattern of codon usage" [29] as well as to compare the codon usage of foreign genes versus that of highly expressed native genes [13]. Here, we applied CAI analyses to assess the degree of adaptation of sequenced genes to the host by comparing the obtained CAI values with those of genes encoding ribosomal proteins in R. leguminosarum. The calculated CAI values for each sequence were arbitrarily grouped and subsequently submitted to ANOVA evaluation, which measures the significance of differences between groups. CAI values can range from 0 (reflecting use of synonymous codons) to 1 (reflecting the strongest bias where codon usage is equal to that in the ribosomal proteinencoding genes) [13].
The CAI values ranged from 0.849 (dnaC-chromosomal gene) to 0.554 (nodA-symbiotic gene). The fixG and thiC had the CAI equal to 0.676 and 0.673, respectively, suggesting weaker adaptation to their genome compartments and further confirming their unstable location as indicated in hybridization analyses. We did not find significant differences with respect to the CAI values calculated for the particular strains, but strains RtTA1, K4.15, K3.6, and K3.22 previously observed as most divergent had a high average CAI of the studied sequences (from 0.722 to 0.718), possibly indicating good adaptation of the genes to the host. Finally, the CAI values were evaluated according to the location of genes in the different genome compartments ( Table 3). The CAI values of genes located on the chromosome and chromid-like replicons were high and significantly differed from each other. The genes located on the 'other plasmids' (including pSym) had the lowest CAI values significantly different from the former ones. These results demonstrated weaker adaptation of plasmid genes to the host genome in comparison to the chromosome and chromid-like genes.

Discussion
Three genome compartments that differed genetically and functionally can be distinguished in the nodule population of R. leguminosarum bv. trifolii: the chromosome, chromid-like and 'other plasmids' including pSym. Chromidlike replicons were distinguished in Southern analyses on the basis of repA and repC sequence similarity to RtTA1  and to the respective replication genes of such replicons described in the sequenced genomes of R. leguminosarum bv. viciae, R. etli and R. leguminosarum bv. trifolii [16]. The chosen name "chromid-like" (as opposed to simply "chromid") was the result of data scarcity concerning their gene content, insufficient to justify the name "chromid" [16]. Moreover, it is known that genes of the repABC operon are peculiar genetic markers because of the complex phylogeny of particular genes within the operon, whose evolutionary history could not be strictly connected with other genes of particular replicons [13].
In the study of the distribution of several chromosomal and plasmid markers within a group of 23 nodule isolates, stable genes permanently located in a specific R. leguminosarum bv. trifolii genome compartment: chromosome, chromid-like and 'other plasmids' including pSym were distinguished. Unstable genes (fixGH, thiC, acdS, pssM and Pss-V region) that changed their location at various rates or were lost from the genome were also detected. Only two of the sampled 23 strains possessed all the studied markers. A majority of strains differed in the gene content and gene distribution, supporting the hypothesis of the pangenomic structure of R. leguminosarum, in which each strain of a given species contains, besides the core genome, additional genetic information specific for the strain [11,17,18,39].
The distribution of the plasmid replication-partition genes was even more dynamic than that of genes not connected with replication. Independent transfer events of repA and repC genes of the putative repABC operon were frequently observed, especially in the 'other plasmids' compartment, which confirmed different evolutionary pathways for various elements of the repABC operon, recently evidenced in Alphaproteobacteria [13]. Such considerable dynamics of replication/partition gene distribution in Rhizobium may account for changes in the plasmid number and, consequently, gene content observed in the sampled population. Beside the dynamics of replication/partition gene distribution, some level of conservation of replication genes, especially those of chromid-like replicons, was also observed. It was reflected in positive hybridizations with pRle-TA1d and pRleTA1b derived rep probes, to the respective replicons of Rlt strains. One could speculate that the conservation of replication genes of chromid-like replicons may be related with their distinct properties e. g. stability. However, the gene content rather than the properties of the replication system, resulting e.g. from conservation of replication genes, seem to be crucial for replicon stability [40].
Redistribution of genes between the different genome compartments could further trigger their sequence divergence under different selective pressures [13,15,41]. Examination of sequence divergence of several stable and unstable chromosomal and plasmid genes showed a low level of substitutions in genes of all the compartments. Nearly identical nucleotide sequences of nifNE markers were found in different pSym plasmids of the studied population ( Figure 6C), confirming the core character of symbiotic genes and their high conservation, despite the overall genome differentiation [11].
The extent of gene adaptation to a given compartment in the host genome was assessed by analyses of alternative codon usage. Three groups of well separated genes were obtained corresponding to the chromosome, chromid-like and 'other plasmids' genome compartments ( Figure 7A) with 96% accordance with hybridization data. In conclusion, the sequence divergence of particular genes may be affected by their location in the given genome compartment. When all the sequences of the individual strains studied were subjected to a discrimination analysis, we obtained good separation of K3. 22 and a group of strains related to RtTA1 ( Figure 7B) that formed the outermost branch in the phylogenic tree. The remaining strains were randomly mixed with each other but apparently separated from K3.22 and TA1related strains, which suggested no differences in codon usage within the main group.  Values followed by the various letters are significantly different: b (P < 0.05) and cd P < 0.001. ± Standard deviation (SD).
The CAI analyses of the evaluated sequences confirmed good adaptation of chromosomal and chromidlike genes (high CAI values) to host genomes and lower CAI values for 'other plasmids' genes. The CAI values also reflect the level of transcriptional and translational activity of particular genes [29]. While the activity of most of the chromosomal and chromid-like genes could be considered at least to some extent constitutive, the 'other plasmids' and especially symbiosis-related genes are expressed only transiently in the symbiotic stage [42]. Therefore, in the Rhizobium model, the differences in codon usage in translation reflect the balance between the selection pressure and random mutations in the functionally differentiated genome compartments. The differences in codon usage and CAI values between the genome compartments are most likely a consequence of differential gene expression and adaptability to optimal codon usage in host genomes [42].

Conclusion
Our study showed that, even within a small rhizobial population of clover nodule isolates, substantial divergence of genome organization can be detected especially taking into account the content of extrachromosomal DNA. Despite the high variability with regard to the number and size of plasmids among the studied strains, conservation of the location as well as the dynamic distribution of the individual genes (especially replication genes) of a particular genome compartment was demonstrated. The sequence divergence of particular genes may be affected by their location in the given genome compartment. The 'other plasmid' genes are less adapted to the host genome than the chromosome and chromid-like genes.