Fluorescent-BOX-PCR for resolving bacterial genetic diversity, endemism and biogeography

Background BOX-A1R-based repetitive extragenic palindromic-PCR (BOX-PCR) is one of the most used techniques in biogeography studies of microbial isolates. However the traditional separation of BOX-PCR patterns by agarose gel electrophoresis suffers many limitations. The aim of this research was to set up a fluorescent BOX-PCR (F-BOX-PCR) assay in which separation of PCR products is automated in a capillary electrophoresis system. F-BOX-PCR was compared with the traditional BOX-PCR using bacterial strains with different G+C content (Bacillus cereus; Escherichia coli; isolates of the family Geodermatophilaceae). Resolution, discriminatory power and reproducibility were evaluated by assaying different electrophoretic runs, PCR reactions and independent DNA extractions. BOX-PCR and F-BOX-PCR were compared for the analysis of 29 strains of Modestobacter multiseptatus isolated from three different microsites in an altered carbonatic wall from Cagliari, Italy, and 45 strains of Streptococcus thermophilus isolated from 34 samples of the hand-made, yogurt-like product Matsoni, collected in different locations in Georgia. Results Fluorophore 6-FAM proved more informative than HEX and BOX-PCR both in agarose gel electrophoresis (p < 0.004 and p < 0.00003) and in capillary electrophoresis (compared only with HEX, p < 2 × 10-7). 6-FAM- and HEX-based F-BOX-PCR respectively detected up to 12.0 and 11.3 times more fragments than BOX-PCR. Replicate separations of F-BOX-PCR showed an accuracy of the size calling of ± 0.5 bp until 500 bp, constantly decreasing to ± 10 bp at 2000 bp. Cluster analysis of F-BOX-PCR profiles grouped M. multiseptatus strains according to the microsite of isolation and S. thermophilus strains according to the geographical origin of Matsoni, but resulted intermixed when a BOX-PCR dataset was used. Conclusion F-BOX-PCR represents an improved method for addressing bacterial biogeography studies both in term of sensitivity, reproducibility and data analysis.


Background
Typing by DNA fingerprinting is a common tool used in bacterial biogeography and epidemiology studies. Several bacterial species can be differentiated in clonal lines associated to specific animal hosts [1]. Single genetic differences between clonal lineages could be useful to determine the history of an infection or to find new possible borderline strains [2]. Similarly, fingerprinting methods are frequently used to evaluate the global dispersal of environmentally relevant microbial species or lineages in a species [3], to correlate specific genotypes to a given environmental conditions [4] and to evaluate the endemicity of a given microbial type [5,6].
Different DNA-based typing methodologies are now available and BOX-PCR is the most commonly used technique due to its simplicity, efficiency and low cost. This is a particular version of repetitive extragenic palindromic-PCR (rep-PCR) [7] that uses the BOX-A1R primer [8]. BOX-PCR is a fingerprinting analysis based on the BOX dispersed-repeat motif, firstly identified in Streptococcus pneumoniae, but common in a number of bacterial species [9][10][11]. Since the BOX repetitive sequences are interspersed throughout the genome, BOX-PCR is a method potentially capable of simultaneously surveying many DNA regions scattered in the bacterial genome. It has been shown to have similar or even better strain differentiation power, as well as to be easier to perform, than ribosomal intergenic spacer analysis (RISA), restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), random amplified polymorphic DNA (RAPD) and other techniques [12][13][14]. BOX-PCR is quicker, cheaper, and in many cases more discriminatory than pulsed field gel electrophoresis (PFGE) [14], despite is generally less reproducible. BOX-PCR patterns are not affected by the culture age of the strain to be analyzed [15] and fingerprinting output can be easily analyzed by computer assisted methods [16]. These features make BOX-PCR a frequently used tool in biogeography studies in environmental microbiology [5,[17][18][19][20].
The current BOX-PCR technique, in which the amplified products are separated by agarose gel electrophoresis, suffers from several limitations like poor band resolution and run standardization for comparison of the different profiles in different gels. To overcome these limitations separation of fluorescent labelled products in automated DNA sequencer can be used [21], but this interesting improvement has been applied rarely in environmental analysis and limited to machines performing separation in long polyacrylamide gels [21,22].
In this study we show that fluorescent BOX-PCR (F-BOX-PCR), in which the separation of PCR products is performed in an Abi-Prism 310 capillary electrophoresis sys-tem, is capable of resolving endemicity and the biogeographical repartition of different bacterial populations. We first assessed suitability and reproducibility of different electrophoretic runs of different F-BOX-PCR reactions prepared from independent extractions of DNA from eight bacterial strains exhibiting different G+C content. The power of F-BOX-PCR in resolving bacterial endemicity was assessed on a collection of Modestobacter multiseptatus strains isolated from three different microsites of an altered ancient carbonatic wall in the old city of Cagliari, Sardinia, Italy [23]. Biogeographic segregation of different bacterial populations was tested on a collection of 45 strains of Streptococcus thermophilus isolated from the Caucasian home-made yogurt-like product Matsoni produced in different areas of the Georgian Caucasus [24].

Reproducibility of BOX-PCR and F-BOX-PCR in agarose gels
The reproducibility of BOX-PCR and F-BOX-PCR in agarose gels was analyzed using strains belonging to different taxa with very diverse G+C content. We chose six strains of Geodermatophilaceae (G+C content, about 70%) belonging the genera Blastococcus and Modestobacter, one strain of E. coli (50%) and one strain of B. cereus (35%). The six strains of Geodermatophilaceae were chosen on the basis of their BOX-PCR patterns in agarose gel that should cover a wide size range. Strain DS3 had a BOX-PCR pattern with a relatively small range of fragment length between 300 and 800 bp. Strains CI1-23 and CO2-33 showed BOX-PCR patterns with a wider range of fragment length, between 300 and about 2500 bp and between 300 and about 3000 bp respectively. Strains CI2-13, CI2-17 and CI2-23 were also analyzed since they had very similar patterns.
The average number of fragments found with BOX-PCR varied between 6.3 (B. cereus) and 16.6 (E. coli) with relatively high standard deviations (SD) between 0.9 and 4.0 (average SD = 2.2). Number of bands could be variable among replicates from independent DNA extractions, PCR or agarose gels: for example, B. cereus BOX-PCR patterns were represented by 3 to 10 fragments, strain CI2-17 pattern varied between 9 and 18 fragments, while E. coli pattern showed 14 to 19 bands.
Reproducibility of BOX-PCR profiles, obtained from different DNA extractions and different runs in agarose gels, is shown in Table 1 and it is expressed as Jaccard's similarity coefficient between replicates. Examples of BOX-PCR profiles are shown in Figure 1. Reproducibility of results with standard BOX-PCR was affected by DNA extraction (86.2 ± 5.7% of similarity), PCR amplification (78.0 ± 16.1%) and gel separation (83.7 ± 9.0%). The overall similarity calculated for all replicates was of 62.7 ± 20.5%.
Similarity values were occasionally quite low, as in the case of B. cereus which showed only 55.4% of similarities between PCR replicates and an overall similarity of 30.0%. The low reproducibility is attributable to the low number of bands for this strain. BOX-PCR patterns were seldom richer than 15-20 bands and some of these bands were weak (Figure 2). A delicate step of BOX-PCR analysis is the gel staining with ethidium bromide and UV acquisition especially of weak bands. A disappearance of one band in a profile of 10 bands decreases the similarity of two identical profiles by 10%. In the case of B. cereus, the average number of bands was only 6.8 and the absence of a weak band between two replicates affects the similarity by 15%.
Polyacrylamide gels and silver staining gives better resolution of fragments with respect to agarose gel electrophoresis but is labour-intensive and can suffer a relatively low reproducibility due to the gel staining procedures. However it would be worth investigating the convenience of applying polyacrylamide gel electrophoresis for analysis of BOX-PCR patterns in comparison with F-BOX-PCR in terms of economic costs, time required in sample handling and overall reproducibility.
For all the strains that were tested, the use of BOX-A1R primer labelled with fluorescent chromophores affected the BOX-PCR profiles in agarose gel by decreasing the number of detectable bands ( Figure 1 and Table 2); in case of B. cereus only one band was often visible for both HEX and 6-FAM labelled primers. Band pattern modifications were also noted in denaturing gradient gel electrophoresis (DGGE) for products labelled with 6-FAM dye in comparison with those non labelled [25]. Computerassisted analysis of agarose gel electrophoresis showed that patterns obtained with 6-FAM labelled primer were generally richer in bands than those obtained with the fluorophore HEX (p < 0.0004). A similar result was observed by Ranjard and collaborators [26], who found 6-FAM primers giving the best total peak intensity in Automated rRNA Intergenic Spacer Analysis (ARISA).

Reproducibility of capillary electrophoresis
The F-BOX-PCR products labelled with HEX and 6-FAM were run twice in capillary electrophoresis. For all the strains tested, peak number obtained by using 6-FAM was significantly higher than DNA band numbers in agarose gel profiles obtained by BOX-PCR (p < 3 × 10 -6 ) and 6-FAM-based F-BOX-PCR. 6-FAM peak profiles were confirmed to be generally richer than HEX patterns (p < 2 × 10 -7 ). The average number of peaks varied between 31.0 (strain DS3) and 72.0 (strain CI1-23) with very low relative standard deviations (between 0.0 and 2.0; Table 2). Altogether, F-BOX-PCR gave more informative patterns (1.9 to 12.0 times) than BOX-PCR, indicating that capillary electrophoresis detects at least two times more peaks than agarose gel stained with ethidium bromide. 6-FAM was more informative than HEX when comparing these F-BOX-PCRs with standard BOX-PCR (1.3 to 11.3; p < 3 × 10 -8 ). The richness of 6-FAM-based profiles is evident in Figure 2, in which an example of run for each strain tested is shown. Profiles were well characterized by a number of high, medium and above all, small peaks that were sometimes very important. For example in the profile of B. cereus 360 ( Figure 2G) three high peaks were visible (5100 ± 1800 units of fluorescence at 601 bp, 3000 ± 1700 at  Since run characteristics along a capillary are nonlinear and most commercial size standards can present double peaks, there are greater uncertainties in sizing large DNA fragments. The Local Southern sizing method of Genescan software may improve the fragment sizing, generating the best-fit sizing curve. The software gives size in base pairs as a function of migration time of standard fragments. In our replicates we observed that up to 500 bp the precision of separation in all samples was close to ± 0.5 bp, but over that, size precision constantly decreased reaching ± 1 bp at 800 bp, ± 5 bp at 1400 and ± 10 bp at 2000 bp. Over 1500 bp, the capability of separation of the 47 cm-capillary provided with POP4 polymer is reduced heavily appreciably affecting the sizing process with peak size discrepancy of ± 20 bp between replicated samples. Over 1300 bp, base peaks became wider and tended to merge as it can be noted in E. coli ( Figure 2H). However, the precision of the region between 1300 and 2000 bp in perfect conditions of run can be acceptable in comparison with agarose or polyacrylamide gels, since the seven peaks at 1700-2000 bp of the similar strains CI2-13, CI2-17 and CI2-23 can be easily aligned manually (Figure 2A, 2B, 2C). The problem of the discrepancy in sizing large fragments is reflected in the set up of the data matrix due to the size calling precision (± 0.01 bp). This uncertainty may cause F-BOX-PCR to appear less similar to each other, caused by splitting of peaks between adjacent combination windows [27]. Siz-   ing interpolation of large peaks beyond 1000 bp could be improved by an improvement of polymers used in fragment separation or in innovative downstream matrix analysis. In spite of these uncertainties, the overall percentage of similarity of replicated analyses for 6-FAMbased F-BOX-PCR was 95.2 ± 3.5% (Table 1). As observed for agarose gels, the reproducibility of the technique is mainly affected by different DNA extractions and by different PCR amplifications than by capillary separation (Table  1).

F-BOX-PCR typing of M. multiseptatus from different stone microsites
We used F-BOX-PCR for analyzing bacterial endemisms on the surface of three microsites of a highly biodeteriored calcarenite stone characterized by a relatively high rate of erosion due to rainwater, saltiness and wind. We previously found that this stone was heavily colonized by actinobacteria of the family Geodermatophilaceae, but typing with standard BOX-PCR failed to find clear relation between isolates and the microsites where they came from [23] despite these microsites were characterized by different colours and types of alterations (pittings, patinas, etc.).
We compared F-BOX-PCR with BOX-PCR, by analyzing with the two methods, 29 strains of M. multiseptatus isolated from the surface of three different microsites on the stones (microsites A, C and D). The number of bands detected by BOX-PCR were between 5 (strains AS4 and AS10) and 17 (strain Cag100) with an average of 10.3. Band size was between 250 and 2500 bp. UPGMA tree failed to clearly cluster strains on the basis of the microsite of isolation ( Figure 3A). The similarity of groups was between 15 and 70%. The comparison between the cophenetic similarity matrix and the original similarity matrix done with the Mantel's test gave a productmoment correlation of 0.68, a very low fit value that do not make UPGMA output significant. F-BOX-PCR of these strains gave between 25 (AS12) and 49 (DS32) peaks with an average of 37.1. Peak size was between 150 and 1978 bp. The average height of the peaks was 310 units of fluorescence. Strains were grouped in the UPGMA tree accord-ing to the microsite of isolation ( Figure 3B). Strains isolated from microsite A clustered in groups A and B, with an overall similarity of 11%. Strains from microsite C were segregated in cluster E with most of the strains with a similarity of 9% and cluster F including strains Cag92 and Cag93 with a similarity of 38%. Strains of microsite D clustered in one group (D) with 14% similarity, while strain DS32, although in relation with cluster D, appeared to be separated. The coefficients of similarity were lower than the BOX-PCR tree due to the number of peaks, 3.6 times higher than bands in the agarose gels. A very high product-moment correlation (0.96) characterized F-BOX-PCR tree. Since clusters of Figure 3B were constituted by strains of homogeneous microsite origin, Pearson's correlation between sample location and genomic clusters shown in UPGMA tree was 1.00.

F-BOX-PCR typing of S. thermophilus isolates from different source in Georgia
For evaluating if F-BOX-PCR can discriminate bacterial population on a geographical scale, the bacterial diversity of 45 strains of S. thermophilus isolated from the Georgian yoghurt Matsoni [24] produced in 34 different farms was studied ( Table 3).
The number of bands detected by BOX-PCR were between 2 (for strain 3278) and 10 (strain 3238) with an average of 7.5 bands and size was between 250 and 2000 bp. Although some homogeneous groups were detectable in UPGMA tree ( Figure 4A), cluster analysis failed to clearly group strains on the basis of the geographical origin of Matsoni (r = 0.59, P = 2 × 10 -5 ). The similarity of groups was between 33 and 100%. Five clusters included strains with identical BOX-PCR patterns, although strains were isolated from different regions in Georgia. Comparison between the cophenetic similarity matrix and the original similarity matrix done with the Mantel's test gave a product-moment correlation of 0.94, a very high fit value related to the relative low number of bands in the agarose gel and the high number of identical patterns.
F-BOX-PCR of the same strains gave between 11 (strain 3211) and 28 (strains 3232 and 3276) peaks with an aver-a The comparison between BOX-PCR analysis and F-BOX-PCR analysis separated in agarose gel electrophoresis or in capillary electrophoresis was done considering eight strains belonging to B. cereus (G+C content 35%), to the family Geodermatophilaceae (G+C content 60%) and to E. coli (G+C content 50%). The analysis was done in duplicate considering two independent DNA extractions (numbered 1, 2), two independent PCR amplifications (1, 2) and two independent agarose gel runs or sample injection in the sequencer (1, 2) b The numbers express the ratio of HEX-labeled amplicons detected by capillary electrophoresis versus unlabeled amplicons detected on agarose gels c The numbers express the ratio of 6-FAM-labeled amplicons detected by capillary electrophoresis versus unlabeled amplicons detected on agarose gels n.d. not determined. n.a. not applicable.

Table 2: Number of fragments detected by BOX-PCR and F-BOX-PCR analysis using agarose and capillary electrophoresis separation and ratios between the number of fluorochrome-labeled fragments detected by capillary electrophoresis and the number of unlabeled fragments detected by agarose electrophoresis a . (Continued)
Examples of F-BOX-PCR electropherograms after analysis with Genescan software  2 and sizes between 119 and 539 bp. The average peak height was 285 fluorescence units. The similarity coefficients were lower than the BOX-PCR tree (from 23 to 89%) as observed with the previous analysis of M. multiseptatus, but it is explained by the number of peaks three times higher than the bands in the agarose gel. F-BOX-PCR did not produce identical patterns among the strains, hence the diversity in the peak matrix increased. For this reason, although the product-moment correlation of F-BOX-PCR tree was lower than in the agarose gel-derived UPGMA tree (0.88 vs. 0.94), the fit of the analysis is still positive.
F-BOX-PCR clustered S. thermophilus strains in the UPGMA dendrogram according to the geographical origin of the product from which they were isolated (r = 0.81, P = 3 × 10 -11 ). Most of the strains isolated from western Georgia were clustered in two groups (A and B in Figure  4B) with similarity of 54% (including strains collected between 41° 48' E and 42° 42' E) and 46% (strains isolated between 42° 37' E and 42° 46' E). Within group A, it is placed strain 3203 from Mtskaldidi, a place relatively close to Senaki area. Cluster C includes with a similarity of 56% two subclusters (C1 and C2) with an average similarity of about 70%. Strains from the Black Sea coast (between 41° 43' E and 41° 48' E) were grouped into C1, with two strains isolated from inner western Georgia (3235 and 3236, both from Senaki, 42° 03' E, close to the Black Sea coast). Subcluster C2 was mainly formed by strains isolated in the surrounding area of Tbilisi (between 44° 44' E and 44° 51' E), plus strain 3252 isolated in Batumi. Cluster D was formed by strains collected from various farms located in the mountain area of Central Georgia. The remaining strains were grouped in two groups (E and F) characterized by an overall homogeneous geographical origin, although group F had a low strain similarity (26%). Detrended Principal Coordinate Analysis was used to cluster UPGMA groups defined by F-BOX-PCR with ecotypes determined from the geographical characteristics of sampling site ( Figure 5). The resulting plot, representing the 58.0% of the total inertia, confirmed the correlation found in the UPGMA tree. For example, clusters C2 and E were close to the Tbilisi area (T), as expected, since they were mostly formed by strains isolated from that region. Strong relationships were also found for groups A and B according to the western origin

Conclusion
Our study showed that a bacterial DNA fingerprinting technique that use automated procedures for DNA fragment separation could be an advantageous methodology to type and track microbial isolates and define endemisms in homogeneous environments where differential pressure is exerted by secondary (minor) environmental factors. In the case of M. multiseptatus, while standard BOX-PCR failed to find a clear strain clustering on the basis of sampling site [23], F-BOX-PCR gave a very well defined grouping with a clear relation with the microsite of isolation. This indicated that the technique can highlight ende- misms in epilytic/endolytic environments, despite the environmental factors driving this endemism remain unknown and are still to be explored. In case of S. thermophilus, our data showed that F-BOX-PCR succeeded in defining a relation of strain types with the area of isolation, evidencing a geographical specificity. Despite the relative easiness, automated fingerprinting analysis is not yet much used in the industrial system, although the final output can be analyzed by rapid, efficient and computerized analyses. F-BOX-PCR can prove as an advantageous tool to routinely depict microbial communities, for instance, for typing isolates in food industry or in traditional and regional [28,29] dairy products. F-BOX-PCR, coupled with other well-known automated techniques, could help in labelling food products with the P.D.O.
(Protected Designation of Origin) and monitoring the related production chain. For example, F-BOX-PCR could be applied in parallel to Length Heterogeneity-PCR that was recently proposed as a rapid and precise method to characterise the lactic acid bacteria present in natural whey starters for the P.D.O. Grana Padano cheese and in maize silage [30,31].
Similarity UPGMA trees of BOX-PCR patterns (A) and F-BOX-PCR patterns (B) of 45 bacterial strains belonging to S. ther-mophilus isolated from the typical yoghurt-like product Matsoni produced in 34 different farms in Georgia

Environmental samples and pure strains
Bacterial strains used in this work belong to M. multiseptatus and to S. thermophilus species on the basis of 16S rRNA gene analysis and species-specific PCR analysis targeted to lacZ gene [32] respectively. All strains isolated from Matsoni were also tested by carbohydrate fermentation profiles (API-20 Strep, Bio-Merieux). Strains belonging to M. multiseptatus were isolated from an ancient carbonatic wall of the city of Cagliari (Microsite C, Sardinia, Italy) [23]. The genotypes of these isolates attributed to the family Geodermatophilaceae have been previously studied by BOX-PCR [23]. In addition to these strains, other Geodermatophilaceae were isolated from two other microsites (A and D) of the same ancient wall, as previously described [23].
From 34 samples of home-made Matsoni, 45 strains belonging to S. thermophilus were isolated (Table 3). Matsoni samples were collected in local market or families in different cities or villages, from the eastern to the western part of Georgia. One ml of the sample was resuspended in 9 ml sterilized 0.85% NaCl solution and mixed thoroughly. Serial dilutions (10 -1 to 10 -8 ) were prepared and 0.1 ml of appropriate dilution was spread in duplicate onto M17 agar plates. After incubation (37°C, 24-48 h, aerobic conditions) strains were purified by streak plating and cultivated in M17 broth using lactose at the final concentration of 2% (w/v). Stock cultures were stored in glycerol solution (20%) at -80°C. ratory assistance. The manuscript was edited for language by Sachin Bachate.