Molecular characterization of a mosaic locus in the genome of 'Candidatus Liberibacter asiaticus'

Background Huanglongbing (HLB) is a highly destructive disease of citrus production worldwide. 'Candidatus Liberibacter asiaticus', an unculturable alpha proteobacterium, is a putative pathogen of HLB. Information about the biology and strain diversity of 'Ca. L. asiaticus' is currently limited, inhibiting the scope of HLB research and control. Results A genomic region (CLIBASIA_05640 to CLIBASIA_05650) of 'Ca. L. asiaticus' showing hyper-sequence variation or locus mosaicism was identified and investigated using 262 bacterial strains (188 from China and 74 from Florida). Based on the characteristic electrophoretic profiles of PCR amplicons generated by a specific primer set, eight electrophoretic types (E-types) were identified, six E-types (A, B, C, D, E, and F) in China and four E-types (A, C, G, and H) in Florida. The 'Ca. L. asiaticus' strains from China consisted predominately of E-type A (71.3%) and E-type B (19.7%). In contrast, the 'Ca. L. asiaticus' strains from Florida was predominated by E-type G (82.4%). Diversity of 'Ca. L. asiaticus' in China was also evidenced. Strains from the high altitude Yunnan Province consisted of five E-types with E-type B being the majority (62.8%), whereas strains from the low altitude coastal Guangdong Province consisted of only two E-types with E-type A as the majority (97.0%). Sequence analyses revealed that variation of DNA amplicons was due to insertion/deletion events at CLIBASIA_05650 and the downstream intergenic region. Conclusions This study demonstrated the genomic mosaicism of 'Ca. L. asiaticus' resulted from active DNA insertion/deletion activities. Analyses of strain variation depicted the significant inter- and intra-continent diversity of 'Ca. L. asiaticus'.


Background
Huanglongbing (HLB) is a destructive disease of citrus production worldwide. All known commercial citrus cultivars are susceptible to HLB. The disease was first noted in Chaoshan area in Guangdong Province of the People's Republic of China in the late of 1800s [1] and is currently distributed in 10 citrus producing provinces in South China. HLB is now established in Sao Paulo of Brazil [2] and Florida of the United States [3] where it poses a great threat to the citrus industry. The disease is associated with three species of non-culturable, phloemlimited, α-Proteobacteria: 'Candidatus Liberibacter asiaticus', 'Ca. L. africanus', and 'Ca. L. americanus' [4,5]. In both China and U.S., only 'Ca. L. asiaticus' has been detected. Due to the lack of pure culture, 'Ca. L. asiaticus' has been poorly characterized. Little is known about the bacterial biology, genetic diversity, and epidemiology.
Sequence analyses of conserve genomic loci such as 16S rRNA gene and 16S/23S intergenic spacer regions have been used to define 'Ca. Liberibacter' species [4,6]. However, more variable genomic loci need to be identified to better characterize the bacterium. Before the availability of whole genome sequence, Bastianel et al. [7] identified an outer member protein gene (omp) to differentiate isolates/strains of 'Ca. L. asiaticus' from different geographical origins, although each regions was represented by only one to three strains. Tomimura et al. [8] analyzed the single nucleotide polymorphisms (SNPs) in a bacteriophage-type DNA polymerase gene and revealed three clusters of 'Ca. L. asiaticus' strains from the Southeast Asia. All Indonesia strains clustered in one group and the other two clusters were not correlated with geographical origins including Vietnam, Thailand, Taiwan, and Japan.
The completed genome sequence of 'Ca. L. asiaticus' Strain Psy 62 is now available [9]. The annotated genome has 1,109 protein and 53 RNA coding loci and is readily accessible for genomic analyses. Based on the variation of tandem repeat number (TRN) at the locus of CLIBA-SIA_01645, the population of 'Ca. L. asiaticus' strains in Guangdong of China was found to differ from that in Florida of U.S. [10]. This analysis of TRN also detected the possible presence of two genotypes in Florida: a TRN < 10 genotype that widely distributed statewide and a TRN > 10 genotype that was limited to central Florida. In Guangdong, TRN variations were more heterogeneous and correlations to geographical origins were not established. A recent report used four tandem repeat loci to analyze 'Ca. L. asiaticus' strains from Japan, Taiwan and Indonesia revealed various levels of population diversity, yet correlation to other genotypes or geographical origins was not known [11]. More recently, a prophage terminase gene (CLIBASIA_05610) was used to evaluate population diversity of 'Ca. L. asiaticus' in two geographically distinct citrus growing provinces (Yunnan and Guangdong) in China [12]. The 'Ca. L. asiaticus' populations in these two locations are significantly different in their prophage terminase gene frequencies. In other bacteria, such as Escherichia coli, Haemophilus influenzae and Xylella fastidiosa, genomic loci with variable TRN or prophage genes are also known to be valuable descriptors of bacterial genetic diversity [13][14][15][16][17].
This study was to further explore the use of available genomic information for 'Ca. L. asiaticus' characterization. We report our observation of DNA mosaicism or hyper-sequence variation at the locus of CLIBA-SIA_05650 and the downstream intergenic region in the genome of 'Ca. L. asiaticus'. PCR analyses using a primer set flanking this genomic locus revealed eight electrophoretic types (E-types) of 'Ca. L. asiaticus' strains from China and U.S. Analyses on DNA mosaic phenomenon depicted the inter-and intra-continent diversity of 'Ca. L. asiaticus'. The molecular nature of DNA mosaicism was identified through sequence analyses.

Sample collection
HLB symptomatic citrus leaves were collected from nine provinces in China ( Figure 1, Table 1) and Florida in U.S. between 2007 and 2010. Each sample originated from a single tree and was tentatively considered as a single strain. All collected samples in China were shipped by mail to Citrus Research Institute of Southwest University in Chongqing, or Citrus HLB research laboratory of South China Agricultural University in Guangdong. Collection of HLB samples in Florida have been described previously [10].

DNA extraction
In Chongqing, midribs of citrus leaves were excised and DNA was extracted using the cetyltrimethylammonium bromide (CTAB) methods as previously described [18]. Procedures of DNA extraction in Guangdong and Florida were described previously [10]. 'Ca. L. asiaticus' was identified by PCR with primer sets OI1/OI2c [4] and ITSAf/ITSAr [19]. DNA preparations were sent to the San Joaquin Valley Agricultural Sciences Center, United Stated Department of Agriculture-Agricultural Research Services, Parlier, CA, U. S. A. for further analyses.

Primers and PCR assays
The whole genome sequence of 'Ca. L. asiaticus' strain psy62 (accession number CP001677) was obtained from NCBI GenBank database. Fifteen primer sets, which targeted genomic loci with tandem repeats and prophage genes, were designed by setting the Tm at 60°C and amplicon size around 800 bp with Primer 3 software [20]. Tandem repeat loci were identified using Tandem Repeat Finder (version 4.03) with default parameters [21]. Of the 45 tandem repeat loci, eight loci with 97-100% matches of each repeat were applied in the study. Seven prophage loci were directly selected from the annotated 'Ca. L. asiaticus' psy62 strain genome. DNA from a set of 10 'Ca. L. asiaticus' strains (5 from China and 5 from Florida) was used to test the capacity of each primer set in detecting strain diversity. Primer set Lap5640f/ Lap5650r flanking the chromosomal region of CLIBA-SIA_05640 to CLIBASIA_05650 was selected for further analysis because it generated different electrophoretic profiles from different strains. Primer specificity to 'Ca. L. asiaticus' were verified by in silico analysis through BLASTn search against the GenBank database. Primer set LapGP-1f/LapGP-1r, targeting a tandem repeat locus of CLIBASIA_01645 [10], was also included in this study for a comparison purpose. All primer sets used in the study are listed in Table 2 and Additional file 1.

Analyses of different 'Ca. L. asiaticus' populations
Although a single amplicon of 797 bp from primer set Lap5640f/Lap5650r was predicted based on the available genome sequence of strain psy62 [9], multiple amplicons were observed from other 'Ca. L. asiaticus' strains from China and Florida. Amplicon profiles on agarose gel were designated as electrophoretic types or E-types. Etype frequencies were summarized and Chi-square test was used to determine the significance of E-type differences at different geographical locations.

DNA sequencing and analysis
DNA bands were excised from the gel and purified using QIAquick Gel Extraction kit (Qiagen, Valencia, CA). Purified DNAs were cloned with pGEM T-easy vector (Promega Corp. Fitchburg, WI) and sequenced using BigDye Terminator v3.1 Cycle Sequencing Kit in a 3130 × 1 Genetic Analyzer (Applied Biosystems, Inc.). Multiple sequence alignments were performed using ClustalW (Ver.1.74) program with the default parameters [22]. Manual adjustment was performed when appropriate. Protein secondary structure prediction was performed by the method of Bryson et al. [23] available in PSIPRED server http://bioinf.cs.ucl.ac.uk/psipred/. The protein 3-D structure model was built based on a fold prediction protocol with the help of Phyre [24].

Nucleotide sequence accession numbers
Nine DNA sequences of 'Ca. L. asiaticus' representing different amplicon sizes and collection origins have been deposited in GenBank with accession numbers JF412691 to JF412699 (Additional file 2).

Detection of DNA mosaicisms by primer set Lap5640f/ Lap5650r
A total of 262 HLB samples detected positive with primer set OI1/OI2c [4] and ITSAf/ITSAr [19] were analyzed. Among them, 188 samples were from nine provinces in China and 74 samples were from Florida    Table 1). The geographical origins of HLB samples in China were from locations of both high altitude region (HAR) and low altitude region (LAR) (Figure 1). PCR amplification with primer set Lap5640f/Lap5650r produced eight E-types, designated as E-type A to H. Each E-type was composed of one or more of five DNA amplicons, designated as P1 to P5 (Figure 2). DNA polymorphisms were not detected with the other 14 primer sets listed in Additional file 1 (data not shown), i.e. each of the 14 primer sets generated a single amplicon.
The 797 bp calculated amplicon in the genome of 'Ca. L. asiaticus' strain psy62 placed the strain to E-type C ( Figure 2, Table 1). Surprisingly, E-type C was found in 3 out of the 74 Florida HLB samples (4.1%). Other Etypes detected in Florida were A, G, and H. E-type G was predominant (82.4%) followed by E-type A (10.4%) and E-type H (4.1%) ( Table 1). Six E-types (A, B, C, D, E, and F) were found in the 188 samples from China ( Figure 2, Table 1). E-type A was the most frequent (71.3%), followed by E-type B (19.7%). When geographical origins were considered, E-type A was mostly from LAR locations and E-type B was mostly from HAR locations. Similarly, only 11 samples (5.8%) from China belonged to E-type C (the same as strain Psy62 in Florida) and they were all from HAR locations ( Table 1).
To avoid the presence of small expected values in the Chi-square test, data in Table 1 were regrouped into four categories: E-type A, E-type B, E-type G and other E-types for location comparisons. The results showed that the E-type distribution of 'Ca. L. asiaticus' population in China were significantly different from those in Florida (P = 1.12 × 10 -44 ). Within the samples from China, the E-type distribution in the LAR population was significantly different from those in the HAR population (P = 1.59 × 10 -22 ).

Correlation between E-types and TRN genotypes
To evaluate the correlation between E-types and TRN genotypes, all 74 'Ca. L. asiaticus' strains from Florida (Table 1) were also tested for TRNs variations with primer set LapGP-1f/LapGP-1r [10]. All the seven E-type A strains belonged to TRN > 10 genotype, whereas the other three E-type strains were grouped with TRN < 10 genotype. Therefore, the Florida strains could be divided into E-type A and non-E-type A groups, matching with TRN > 10 and TRN < 10 genotypes, respectively, and supported the previous observation that there were at least two groups of 'Ca. L. asiaticus' strains in Florida. No significant correlation between E-type and TRN genotype was found after testing all 'Ca. L. asiaticus' strains from Yunnan, Guangxi, and Guangdong provinces (data not shown).
In silico analyses of CLIBASIA_05650 alleles ORF CLIBASIA_05650 was annotated as interrupted gp229, a phage-associated protein [9]. A 72-bp (24 amino acids) insertion as shown in P2 and P5, which distributed in E-type F, G, or H (Figure 3), created an in frame mutation. Close examination showed that CLIBA-SIA_05650 was mostly composed of imperfect six amino acids (or 18 bp nucleotides) tandem repeats leading by residue V (Figure 4). Such hexapeptide domains are common to many bacterial transferases represented by LpxA-like enzymes. The secondary and tertiary (3-D) structure predictions on translated amino acid sequences were constructed (Figure 4). The 24 amino acid insertion apparently shortened many of the betasheets ( Figure 4A) and added a structure motif ( Figure  4B) along with the increases of prediction stability in both secondary and tertiary structures. Interestingly, of the 66 strains which have P2 and P5 amplicons, 64 (97.0%) were collected from Florida, U.S., and only 2 (3.0%) were from Guangdong, China (Table 1).

Discussion
In this study, primer set Lap5640f/Lap5650r yielded one to three amplicons for a given HLB samples. A total of five amplicons with different sizes were identified. They are related by insertion/deletion events, demonstrating the mosaicism in the population genome of 'Ca. L. asiaticus'. In another word, at the locus of CLIBA-SIA_05640-CLIBASIA_05650, 'Ca. L. asiaticus' possesses alleles composed of sequences identical in some parts but polymorphic in other parts. DNA mosaicism described in this study is largely from size variation of different PCR amplicons and confirmed by sequencing with limited strains. Deng et al. [19] showed the coamplification of different amplicons from primer sets targeting the rrn locus in the chromosome of 'Ca. L. asiaticus'. However, further sequencing investigation was not reported. As shown in Figure 2, the mosaicism of E-types B, D, E, G and H is represented by multiple DNA bands from the same PCR primer set, raising a question if a HLB sample has single or multiple clones (or clonal strains) of 'Ca. L. asiaticus'. This is of particular interest, since 'Ca. L. asiaticus' DNA obtained was not from a clonal pure culture. Further complicated the issue is the variation of amplicon intensity, suggesting different concentration of PCR templates. If a single clone scenario is considered, the bacterium should have multiple Lap5640f/Lap5650r loci, either in chromosome or/and in the form of a phage. Lytic phage possessing this genomic locus has recently been reported [25]. Alternatively, the HLB samples may contain multiple clones of 'Ca. L. asiaticus'. More evidence is, however, needed. A third scenario could be the combination of both of the above.
Since the sequenced Florida strain Psy62 belongs to Etype C (Table 2, Figure 2), it is interesting that the frequency of E-type C is low in Florida (4.1%), as well as in China (5.9%). This could mean strain Psy62 may not be the most representative strain. We noted that Psy62 originated from a psyllid and all the 'Ca. L. asiaticus' samples in this study were from citrus. Could it be possible that bacterial population was difference between psyllids and plant hosts? Zhang et al. [25] recently reported that phages behaved differently between plants and psyllids in Florida. Phage SC1 and SC2 were lytic in dodder plant but remained lysogenic in psyllids.
Among the six E-types in China, five were found in Yunnan and two were in Guangdong ( Table 1). The higher E-types number suggests that 'Ca. L. asiaticus' population in Yunnan could be more diverse than that in Guangdong. The uniqueness of P3 (E-type D and E) to Yunnan samples further substantiates the speculation. It should be noted that Yunnan is one of the world origins of citrus species [26]. It remains to be tested if a long history of the presence of citrus species is associated with more diversity of 'Ca. L. asiaticus' population. Information about the population diversity of 'Ca. L. asiaticus' in Yunnan is currently very limited.
The challenge of in vitro culture of 'Ca. L. asiaticus' has been a critical factor limiting our capacity to study the bacterial biology. DNA sequencing and in silico analyses provide a different venue to collect information of unculturable bacteria. Regarding to CLIBASIA_05650, the P1/P3/P4 alleles which encode 18 hexapeptides predominately occurred in 'Ca. L. asiaticus' populations in China, whereas the P2/P5 alleles which have 22 hexapeptides distributed mostly in Florida populations. Hexapeptide variation has been reported in other bacteria [27]. This type of genetic heterogeneity may be associated with phenotypic variation for environment adaptation [17,28].

Conclusions
This study described and analyzed a DNA mosaic phenomenon in the unculturable 'Ca. L. asiaticus' associated with citrus HLB. In addition to the previous studies on two different genomic loci [10,12], we identified a new genomic locus that generated single to multiple amplicons from different HLB samples. Analyses on the DNA mosaicism revealed significant inter-and intra population variations of 'Ca. L. asiaticus' from South China and Florida. Further investigation showed that insertion/ deletion events contributed to the DNA mosaicisms.  CLIBASIA_05650 allele with a 24-amino acid sequence insert. Six motifs are shown in tertiary structure. The 24-amino acid repeat unit is underlined in red and the second 24-amino acid sequence insert is underlined in green. Panel B (bottom): CLIBASIA_05650 allele without a 24-amino acid sequence insert. Five motifs are shown with the tertiary structure. The potential 24-amino acid repeat unit is underlined in black. In both A and B, the first amino acid of a hexapeptide unit, V, is highlighted in red. Confidence of prediction is presented in bar graph (1)(2)(3)(4)(5)(6)(7)(8)(9) in the secondary structure and in Pvalue in the tertiary structure.

Additional material
'948' Project of China (2010-C23). We thank X. Sun, D. Jones and M. Irey for providing bacterial strain DNA. We thank E. Civerolo, C. Wallis and R. Lee for suggestions and critical review of this manuscript. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the U.S. Department of Agriculture.
Author details