Structural analysis of the full-length gene encoding a fibronectin-binding-like protein (CadF) and its adjacent genetic loci within Campylobacter lari

Background The combined sequences encoding a partial and putative rpsI open reading frame (ORF), non-coding (NC) region, a putative ORF for the Campylobacter adhesin to fibronectin-like protein (cadF), a putative Cla_0387 ORF, NC region and a partial and putative Cla_0388 ORF, were identified in 16 Campylobacter lari isolates, using two novel degenerate primer pairs. Probable consensus sequence at the -35 and -10 regions were identified in all C. lari isolates, as a promoter. Results Thus, cadF (-like) gene is highly conserved among C. lari organisms. Transcription of the cadF (-like) gene in C. lari cells in vivo was also confirmed and the transcription initiation site was determined. A peptidoglycan-associating alpha-helical motif in the C-terminal regions of some bacterial cell-surface proteins was completely conserved amongst the putative cadF (-like) ORFs from the C. lari isolates. Conclusion The putative cadF (-like) ORFs from all C. lari isolates were nine amino acid larger than those from C. jejuni, and showed amino acid residues 137 -140 of FALG (50% identity), instead of the FRLS residues of the maximal fibronectin-binding activity site demonstrated within C. jejuni CadF. A neighbor joining tree constructed based on cadF (-like) gene sequence information formed a major cluster consisting of C. lari isolates, separating from the other three thermophilic campylobacters.


Background
Thermophilic Campylobacter species, primarily Campylobacter jejuni and C. coli, are curved, Gram-negative organisms, belonging to the ε-Proteobacteria, and are the most commonly recognized cause of acute bacterial diarrhea in the Western world [1][2][3].
Campylobacter lari is a relatively recently discovered thermophilic Campylobacter species that was first isolated from mammalian and avian species, particularly seagulls of the genus Larus [1,4]. C. lari has also been shown to be a cause of clinical infection [5][6][7][8][9].
In addition, an atypical group of isolates of urease-positive thermophilic Campylobacter (UPTC) have been isolated from the natural environment in England in 1985 [10]. Thereafter, these organisms were described as a biovar or variant of C. lari [11,12]. Subsequent reports described four human isolates in France [11,13]. Some additional isolates of UPTC have also been reported in Northern Ireland [14][15][16] in The Netherlands [17] and in Japan [18,19]. Thus, these two representative taxa, namely urease-negative (UN) C. lari and UPTC occur within the species of C. lari [20].
Bacterial pathogens have the ability to bind to fibronectin (Fn; a component of the extracellular matrix) [21][22][23][24]. Konkel et al. identified and cloned a gene encoding a fibronectin-binding protein (Campylobacter adhesin to Fn; CadF) from C. jejuni [22]. In C. jejuni and C. coli, the cadF virulence gene encodes a 37 kDa outer membrane protein that promotes the binding of these pathogens to intestinal epithelial cells [15].
In relation to cadF of thermophilic Campylobacter other than C. jejuni and C. coli described above, cadF and outer membrane protein gene F (OprF) have been identified in C. coli RM2228 (DDBJ/EMBL/GenBank accession number AAFL01000010 and ZP_00368187), C. lari RM2100 (AAFK01000002 and YP_002574995) and C. upsaliensis RM3195 (AAFJ01000008 and ZP_00371707), following whole genome shotgun sequence analysis [26]. However, no detailed descriptions of the cadF (oprF) gene have yet appeared for these thermophilic Campylobacter strains. In addition, no reports on the cadF (-like) gene in C. lari organisms have yet appeared. Therefore, the aim of the present study was to clone, sequence and analyze the full-length gene encoding the Fn-binding (-like) protein (CadF) and its adjacent genetic loci from several C. lari organisms (UN C. lari and UPTC). We also aimed to confirm the expression of the gene in the C. lari cells.

TA cloning, sequencing and sequence analyses of the fulllength cadF gene and its adjacent genetic loci from the 16 isolates of C. lari
The two primer pairs (f-/r-cadF1 and f-/r-cadF2; Figure1) successfully amplified PCR products of approximately 1.4 and 1.2 [kilo base pairs (kbp)], respectively, with all 16 isolates of C. lari employed (data not shown). Following TA cloning and sequencing, the combined nucleotide and deduced amino acid sequence data from the 16 isolates of C. lari determined have been made accessible in the DDBJ/EMBL/GenBank, with the accession numbers indicated in Table 1.  Table 2, although in this limited study a small number of reference strains of C. jejuni, C. coli and C. upsaliensis were examined. Probable ribosomebinding (RB) sites, AGGA (np 404-407 bp) [Shine-Dalgarno (SD) sequences] [27], that are complementary to a highly conserved sequence of CCUCCU, close to the 3' end of 16S rRNA, were also identified in all the C. lari isolates examined.
In the region upstream of the cadF-like gene, a most probable promoter consensus sequence at the -10 region (TATAAT) (TAGAAT for UPTC isolates (271-276 for UPTC CF89-12)) was identified at the locus between np 272 and 277 bp, with all 16 C. lari isolates and the C. lari RM2100 strain. In addition, probable -35 regions (np 243-248) upstream of the -10 region were also identified, in all C. lari isolates examined.
A putative ORF for the Cla_0387 gene was also estimated to be 642 bp with all 16 C. lari isolates examined (np 1,404 -2,045 bp). The Cla_0387 gene commenced with a TTG and terminated with a TAA with all 16 C. lari isolates and the C. lari RM2100 strain. Apparent small size differences of the putative ORFs for the Cla_0387 also occurred amongst the four thermophilic Campylobacter species examined ( Table 2). Table 3, the nucleotide sequences of the fulllength cadF (-like) structural gene from the 17 C. lari isolates showed 89.4-100.0% similarities to each other ( Table 3). The nucleotide sequences of the full-length Cla_0387 structural gene from the 17 C. lari isolates showed 85.1 -100.0% similarities to each other (Table 4). Thus, the nucleotide sequence similarities of the cadF-like gene appear to be slightly higher than those of the Cla_0387 gene, amongst the 16 C. lari isolates and the C. lari RM2100 strain examined.

As shown in
Moreover, the deduced amino acid sequence alignment analyses were also performed for putative ORFs of the fulllength cadF (-like) gene of 16 C. lari isolates, as well as those of C. lari RM2100, C. jejuni, C. coli and C. upsaliensis strains. The putative ORFs from the 17 C. lari isolates showed 90.9 -100.0% amino acid sequence similarities to each other, and 56.4 -57.9% similarities, with those of two C. jejuni strains (Table 3). They also showed 53.5 -55.8% similarities with those of other thermophilic Campylobacter organisms (two strains of C. coli and C. upsaliensis; Table 3).
Thus, the putative ORFs of the full-length cadF (-like) gene from the 17 C. lari isolates identified in the present study are identical size (984 bp and 328 amino acid residues) with sequence heterogeneity, at both nucleotide and amino acid levels. Table 4, the deduced amino acid sequence similarities were also examined for the putative ORFs of the full-length Cla_0387 gene among the 17 C. lari isolates (86.9 -100.0%) and other thermophilic Campylobacter organisms (50.7 -56.2%), employed as references (data not shown).

As shown in
Thus, cadF (-like) gene is highly conserved among C. lari organisms isolated from humans and natural environments in several countries of Asia, Europe and North America.
In relation to the NC regions, two NC regions of approximately 250 bp, including a promoter at the -10 region and 120 bp occurred upstream of the cadF (-like) gene and downstream of the Cla_0387 gene, respectively, when examined combined sequences from all 16 C. lari isolates. Nucleotide sequences of approximately 250 bp from the 16 C. lari isolates and C. lari RM2100 showed 85.0 -100.0% sequences similarities to each other (Table 5). Nucleotide sequences of approximately 120 bp also showed 85.6 -100.0% sequence similarities among the 17 C. lari isolates. Thus, a considerable genetic heterogeneity of nucleotide sequences in the 250 bp NC region, fulllength cadF (-like) gene, full-length Cla_0387 gene and the 120 bp NC region identified in the present study also occurred among the 17 C. lari isolates including the C. lari RM2100 strain.
A schematic representation of the cadF gene and its adjacent genetic loci for C. lari RM2100, including locations of the novel primers designed in silico (A) Figure 1 A schematic representation of the cadF gene and its adjacent genetic loci for C. lari RM2100, including locations of the novel primers designed in silico (A). Nucleotide sequences of the primers are also shown (B).

Northern blot hybridization, reverse transcription-PCR and primer extension analysis
Northern blot hybridization analysis detected the cadF (like) gene transcription in the two C. lari isolates cells, UN C. lari JCM2530 T and UPTC CF89-12 ( Figure 2A). Since the positive signals of the hybridization were shown at around 1,600 bp ( Figure 2A), the cadF (-like) gene may possibly be transcribed together with the Cla_0387 gene. Thus, cadF (-like) gene transcription was confirmed in the C. lari organisms. When RT-PCR analysis was carried out for the RNA components extracted from the UN C. lari JCM2530 T and UPTC isolates CF89-12 cells with the primer pair of f-cadF2 in the cadF (-like) gene and r-cadF3 in the Cla_0387 gene, as shown in Figure 1, a positive RT-PCR signal was detected at around 800 bp region with both isolates, respectively ( Figure 2B).
The transcription initiation site for the cadF (-like) gene was determined by the primer extension analysis ( Figure  2C). The +1 transcription initiation site for the cadF (-like) gene is underlined in the following sequence; 5'-TTT-TATAATTTCAAAG-3', as shown in Figure 2C.

Deduced amino acid sequence alignment analysis and phylogenetic analyses of the cadF (-like) ORF
We carried out deduced amino acid sequence alignment analysis to elucidate the differences in CadF (-like) protein amongst the thermophilic Campylobacter. As shown in Figure 3, the C. coli RM2228 strain carried a strech of 12 amino acid (VVTPAPAPVVSQ) from amino acid positions 190 to 201, as well as a Q at amino acid position 180, and regarding the nine larger amino acid for C. lari isolates than C. jejuni strains, four amino acid sequences (THTD) from amino acid positions 80 to 83 and five [A(T for UPTC 99) KQID] from 193 to 197 were identified to occur.
When, in retation to a single Fn-binding domain localized at four amino acid (FRLS; CadF amino acid positions 134-137 for C. jejuni) [28], amino acid sequence alignment analysis was carried out, the putative cadF (-like) ORFs from all 17 C. lari isolates examined showed amino acid residues of FALG (50% identity) within the amino acid positions 137-140 instead of the FRLS residues, as shown in Figure 4. A dendrogram showing phylogenetic relationships constructed by the NJ method [29] based on nucleotide sequence information of full-length cadF (-like) gene from 16 C. lari isolates and C. lari RM2100 and other thermophilic Campylobacter reference strains, the 17 C. lari isolates forming a major cluster separating from the other three thermophilic Campylobacter spp. ( Figure 5). In addition, UN C. lari and UPTC organisms were not different and similar based on the nucleotide sequence data of the cadF (-like) gene, as shown in Figure 5.

Discussion
This is the first demonstration of the structural analysis of the full-length gene encoding a CadF (-like) protein and its adjacent genetic loci within C. lari.
Regarding the NC region upstream of the cadF (-like) gene, this region is approximately 250 bp in length with all 16 C. lari isolates and C. lari RM2100 strain. However, the NC regions from the eight C. jejuni and a C. coli reference strains shown in Table 1 examined, are shorter than those and approximately 150 bp in length with unknown reason(s).
In 1995, Koebnik described a peptidoglycan-associating alpha-helical consensus motif in the C-terminal regions of 16 bacterial cell-surface proteins (NX 2 LSX 2 RAX 2 VX 3 L) [30]. When we compared the corresponding amino acid sequences of the putative cadF (-like) ORF from the 17 C.
lari and some C. jejuni isolates with this consensus motif, the motif was completely conserved amongst the cadF (like) ORFs from the isolates (data not shown).
As shown in Table 2, the CMW of the putative cadF (-like) ORF was estimated to be 36,578 to 36,869 Da for the 16 C. lari isolates and C. lari RM2100 reference strain (data not shown). In addition, the value was also estimated to be approximately 36 kDa for the two C. jejuni reference strains (Table 2). These estimated CMW values are in agreement with the previous description of the immunodetection of the CadF protein from five C. jejuni and C. coli isolates [25].
When the nucleotide and deduced amino acid sequence alignment analyses were carried out for the putative cadF (-like) ORF, apparent size differences occurred amongst the four thermophilic Campylobacter species, as described above. Regarding the putative ORFs for cadF (-like) gene between C. lari and C. jejuni organisms, nine amino acid residues are shorter in C. jejuni strains than in C. lari isolates.
Recently, Krause-Gruszczynska et al. (2007) described that the CadF protein from C. coli strains was 13 amino acid larger than those from C. jejuni strains, based on the deduced amino acid sequence alignment analysis [31]. This is consistent with our present results ( Table 2). They also indicated that C. coli strains carried a stretch of 13    amino acid in the middle region of the protein [31]. In addition, in the present study, the deduced CadF (-like) protein was shown to be 328 amino acid from all 17 C. lari isolates and were nine amino acid larger than CadF from two C. jejuni strains (319 amino acid) ( Table 2). Then, we carried out deduced amino acid sequence alignment analysis to elucidate the differences in CadF (-like) protein between C. lari and C. jejuni organisms. As shown in Figure 3, the C. coli RM2228 strain carried a stretch of 12 amino acid (VVTPAPAPVVSQ) from amino acid positions 190 to 201 as well as a Q at amino acid position 180 ( Figure 3). In relation to the nine larger amino acid for C.
lari isolates than C. jejuni strains, interestingly, four amino acid sequences (THTD) from amino acid positions 80 to 83 and five [A(T for UPTC99) KQID] from 193 to 197 were identified, as shown in Figure 3.
Regarding the CadF in Campylobacter, the cadF virulence gene, encoding 37 kDa outer membrane protein that promotes the binding of the pathogens to intestinal epithelial cells, was identified and cloned [22,25].  , consisting of the residues, phenylalanine-arginine-leucine-serine (FRLS) [28]. However, when amino acid sequence alignment analysis was carried out, the putative cadF (-like) ORFs from all 17 C. lari isolates examined in the present study showed amino acid residues of FALG (50% identity) within the amino acid positions 137 -140, instead of the FRLS residues ( Figure 4). No FRLS residues were also detected within any other regions of the cadF (-like) ORF from all 17 C. lari isolates examined. Interestingly, FNLG residues within AdpB (Ad-adhesin in p-Prevotella, B-second identified adhesin) in Prevotella intermedia (a blackpigmented gram-negative anaerobe) [32] was 75% identical to the FALG from C. lari ( Figure 4). Therefore, it may be important to clarify if the CadF (-like) protein from C. lari isolates can bind to fibronectin or not. An experiment is now in progress to resolve this.
In the present study, for the first time, we have described the cloning, sequencing and characterization of fulllength Cla_0387 from the 16 C. lari isolates. The CMW values were estimated to be 23,689 -23,875 Da for the 16 C. lari isolates and C. lari RM2100 strain and these values were also equivalent to those from two C. jejuni and a C. coli reference strains (Table 2). In addition, the cadF (-like) gene and the Cla_0387 gene may possibly be functional within C. lari isolates, based on the present northern blot hybridization and RT-PCR observations, as shown in Figure 2A and 2B.
Thus, the cadF (-like) gene and the Cla_0387 gene could be co-transcribed within C. lari organisms, consisting of an operon. Since the Cla_0387 showed a high deduced amino acid sequence similarity to the Escherichia coli haloacid dehalogenase-like phosphatase [33], these two may have an important biological relationship within the C. lari cells.
In the present study, the authors designed two novel primer pairs (f-/r-cadF1 and f-/r-cadF2) in silico for amplification of an approximate 2.3 kbp region, including the full-length cadF (-like) gene and its adjacent genetic loci, based on sequence information of C. lari RM2100, C. jejuni RM1221 and C. coli RM2228 strains, resulting in successful amplification, TA-cloning and sequencing of those from the 16 C. lari isolates isolated from differencet sources and in several countries. Therefore, the present novel PCR primer pairs would be likely of value for, C. jejuni and C. coli organisms, as well as for other C. lari isolates.
A dendrogram showing phylogenetic relationships was constructed by the NJ method [29], based on nucleotide sequence information of full-length cadF (-like) gene from 16 C. lari isolates and C. lari RM2100 and other thermophilic Campylobacter reference strains. As shown in Figure 5, the 17 C. lari isolates form a major cluster separating from the other three thermophilic Campylobacter spp. In addition, the 17 C. lari isolates form some minor clusters, respectively, based on nucleotide sequence information from cadF (-like) gene ( Figure 5). Thus, nucleotide sequence information of full-length cadF (-like) gene can be regarded as reliable in the molecular discrimination of C. lari organisms from the other three thermophilic campylobacters. In addition, Figure 5 also indicated that NJ dendrogram of UN C. lari and UPTC organisms were not different and similar based on the nucleotide sequence data of the cadF-like gene.

Conclusion
The combined sequences encoding a partial and putative rpsI open reading frame (ORF), non-coding (NC) region, a putative ORF for the Campylobacter adhesin to fibronectin-like gene, a putative Cla_0387 ORF, NC region and a partial and putative Cla_0388 ORF, were identified in 16 Amino acid sequence alignment analysis of part (around a single-Fn binding domain within C. jejuni CadF) of the putative ORF for cadF (-like) gene from the 17 C. lari isolates Figure 4 Amino acid sequence alignment analysis of part (around a single-Fn binding domain within C. jejuni CadF) of the putative ORF for cadF (-like) gene from the 17 C. lari isolates. Amino acid sequences of those from the C. jejuni and C. coli reference strains were aligned for comparison. FALG residues of C. lari and FRLS residues of C. jejuni and C. coli strains were underlined, respectively. In this Figure, amino acid sequence of AdpB (aa 201-230) from Prevotella intermedia 17 [32] was also aligned for comparison. FNLG residues of P. intermedia 17 were also underlined. The alignment analysis data from the UN C. lari isolates RM2100, 298, 300 and 84C-1, from the UPTC isolates NCTC12892 Campylobacter lari isolates, using two novel degenerate primer pairs. Transcription of the cadF-like gene in C. lari cells in vivo was also confirmed and the transcription initiation site was determined. The putative cadF (-like) ORFs from all C. lari isolates were nine amino acid larger than those from C. jejuni, and showed amino acid residues 137 -140 of FALG (50% identity), instead of the FRLS residues of the maximal fibronectin-binding activity site demonstrated within C. jejuni CadF.

Methods
Campylobacter isolates and culture conditions C. lari isolates (n = 4 UN C. lari; n = 12 UPTC), which were isolated from different sources and in several countries of Asia, Europe and North America and used in the present study, are shown in Table 1.

Genomic DNA preparation, primer design and PCR amplification
Genomic DNA was prepared using sodium dodecyl sulfate and proteinase K treatment, phenol-chloroform extraction and ethanol precipitation [34].
A phylogenetic tree constructed based on nucleotide sequence information of full-length cadF (-like) gene from 17 C. lari iso-lates and other thermophilic campylobacters Figure 5 A phylogenetic tree constructed based on nucleotide sequence information of full-length cadF (-like) gene from 17 C. lari isolates and other thermophilic campylobacters. The tree was constructed by the NJ method [29]. values, 0.02, in the figure represent evolutionary distances. Boot-strap values of 1,000 are shown at the branch point. Out-group is C. upsaliensis RM3195.