Analysis of the genome content of Lactococcus garvieae by genomic interspecies microarray hybridization

Background Lactococcus garvieae is a bacterial pathogen that affects different animal species in addition to humans. Despite the widespread distribution and emerging clinical significance of L. garvieae in both veterinary and human medicine, there is almost a complete lack of knowledge about the genetic content of this microorganism. In the present study, the genomic content of L. garvieae CECT 4531 was analysed using bioinformatics tools and microarray-based comparative genomic hybridization (CGH) experiments. Lactococcus lactis subsp. lactis IL1403 and Streptococcus pneumoniae TIGR4 were used as reference microorganisms. Results The combination and integration of in silico analyses and in vitro CGH experiments, performed in comparison with the reference microorganisms, allowed establishment of an inter-species hybridization framework with a detection threshold based on a sequence similarity of ≥ 70%. With this threshold value, 267 genes were identified as having an analogue in L. garvieae, most of which (n = 258) have been documented for the first time in this pathogen. Most of the genes are related to ribosomal, sugar metabolism or energy conversion systems. Some of the identified genes, such as als and mycA, could be involved in the pathogenesis of L. garvieae infections. Conclusions In this study, we identified 267 genes that were potentially present in L. garvieae CECT 4531. Some of the identified genes could be involved in the pathogenesis of L. garvieae infections. These results provide the first insight into the genome content of L. garvieae.


Background
Lactococcus garvieae is one of the most important bacterial pathogens that affect different farmed fish species in many countries, although its major impact is on the trout farm industry [1,2]. In addition to farmed fish, this microorganism has also been isolated from a wide range of wild fish species, from both fresh and marine water, as well as from giant fresh water prawns [3] and from wild marine mammals [4]. The host range of L. garvieae is not limited to aquatic species. This agent has also been identified in cows and water buffalos with subclinical mastitis [5,6] and from cat and dog tonsils [7]. In humans it has been isolated from the urinary tract, blood, and skin and from patients with pneumonia, endocarditis or septicaemia [8][9][10][11]. Recently, intestinal disorders in humans have been associated with the consumption of raw fish contaminated with this pathogen [12], which suggests that L. garvieae could be considered as a potentially zoonotic bacterium [3,12]. Despite the widespread distribution and emerging clinical significance of L. garvieae in both veterinary and human medicine, there is almost a complete lack of knowledge about the genetic content of this microorganism.
In the last few years, research in microbial genetics has changed fundamentally, from an approach involving the characterization of individual genes to a global analysis of microbial genomes. The availability of complete genome sequences has enabled the development of high-throughput nucleic acid hybridization technologies including macro-and microarrays. Microarrays have the capacity to monitor the genome content of bacterial strains or species very rapidly. Although whole-genome sequencing is definitely a powerful method for genetics, it is still expensive and time consuming. As an alternative, comparative genomic hybridization (CGH) experiments based on microarrays have been used to facilitate comparisons of unsequenced bacterial genomes. Arraybased CGH using genome-wide DNA microarrays is used commonly to determine the genomic content of bacterial strains [13,14], but also for inter-species comparisons [14][15][16]. In this case, microarrays of closely related microorganisms that have been fully sequenced must be available. The primary advantage of this microarray approach is that it allows the identification of a large number of genes that are potentially present in an organism without the need for sequencing genomes. The disadvantage of this approach is that it indicates only the genes that are common between the fully sequenced relative and the strain of interest; genes unique to the strain of interest remain unknown [15,17]. In the present work the genetic content of L. garvieae CECT 4531 was studied by a combination of in silico analysis and in vitro microarray CGH experiments, using open reading frame (ORF) microarrays of two bacteria closely related to L. garvieae, namely Lactococcus lactis subsp. lactis IL1403 and Streptococcus pneumoniae TIGR4 [18,19].

Methods
Bacterial strains, culture conditions and isolation of genomic DNA Lactococcus lactis subsp. lactis IL1403 (kindly provided by M.P. Gaya, INIA, Madrid, Spain) and Streptococcus pneumoniae TIGR4 (purchased form the American Type Culture Collection) were used as the reference sequenced microorganisms. The test strain of Lactococcus garvieae used for the experiments was CECT 4531 (purchased from the Spanish Type Culture Collection). The L. lactis subsp. lactis IL1403 and L. garvieae CECT 4531 were grown statically at 28°C in BHI broth (bio-Mérieux, Marcy l'Etoile, France). The S. pneumoniae TIGR4 was grown statically at 37°C in Todd Hewitt broth (Oxoid, Basingstoke, Hampshire, England). Cells were grown until the late-exponential phase of growth (OD600~1.5-2) and harvested for isolation and purification of genomic DNA using the DNeasy Blood and Tissue kit (Qiagen, Hilden, Germany) according to the manufacturer's specifications. The DNA concentrations were determined spectrophotometrically.

DNA labelling
Aliquots (1-2 μg) of genomic DNA from the three strains were labelled fluorescently with Cy3-dUTP or Cy5-dUTP (Perkin-Elmer, Foster City, CA, USA), depending on whether the strain was used as a test or reference microorganism in the CGH experiments, respectively. Each DNA aliquot was fragmented by sonication to obtain fragments from 400 to 1000 bp. Fragmented DNA was mixed with 5 μL 10× NEBlot labelling buffer containing random sequence octamer oligonucleotides (New England Biolabs, Ipswich, MA, USA) and water to a final volume of 43.5 μL. This mixture was denatured by heating at 95°C for 5 min and then cooled for 5 min at 4°C. After this denaturing step, the remaining components of the labelling reaction were added: 5 μL of 10 × dNTP labelling mix (1.2 mM each dATP, dGTP and dCTP in 10 mM Tris pH 8.0, 1 mM EDTA) (New England Biolabs, Ipswich, MA, USA), 1.5 μL of 1 mM Cy3-dUTP or Cy5-dUTP and 1.5 μL of 10 U/μL Klenow fragment (Fermentas Life Sciences, Glen Burnie, MD, USA). The labelling reactions were incubated overnight at 37°C and then stopped by adding 2.5 μL of 0.5 M EDTA. Labelled DNA was purified from unincorporated label using a Qiaquick PCR Cleanup kit (Qiagen, Hilden, Germany) and dried under vacuum. The final DNA concentration and quality, as well as the labelling quality, were determined using a NanoDrop (NanoDrop Techonologies, Wilmington, DE, USA).
The CGH experiments were performed by means of competitive hybridizations using DNA of L. lactis subsp. lactis IL1403 or S. pneumoniae TIGR4, depending on the array, as positive controls. The DNAs to be hybridized on the same array were labelled with Cy3-dUTP and Cy5-dUTP, respectively. For each microarray hybridization reaction, aliquots (1-2 μg) of labelled genomic DNAs of the reference (labelled with Cy3) and test (labelled with Cy5) strains, were mixed in 45 μL EGT hybridization solution (Eurogentec, Serain, Belgium) and denatured at 65°C for 2 min. The hybridization mixture was then loaded onto a microarray slide, covered with a coverslip and incubated at 38°C overnight. Following hybridization, the slides were washed in 2 × SSC, 0.5% SDS for 5 min followed by a second wash step in 1 × SSC, 0.25% SDS for 5 min. Finally, slides were rinsed in 0.2 × SSC and dried by centrifugation.

Data acquisition and analysis
The microarray was scanned after hybridization using a Scanarray HT microarray scanner (Perkin-Elmer). The signal intensity of the two fluors was determined using ImaGene software (BioDiscovery, El Segundo, CA, USA). Microarray data were analysed using ImaGene software, Microsoft Excel and an in-house designed and built Microsoft Access database [21]. Gene calling was based on a signal-to-noise ratio (SNR) >3 for each spot. After the CGH experiments, a gene was considered to show a positive result when it was present in at least three of the four CGH assays. In the case of the L. garvieae CECT 4531 hybridizations with the L. lactis subsp. lactis IL1403 arrays, it was necessary to perform a larger number of assays (n = 8), owing to the poor quality of one of the batches of arrays used. Thus, the criterion chosen to determine a positive result in this case was when the gene was present in at least five of the eight CGH assays.

In silico sequence analysis
Sequence analyses were carried out to assess the performance of the inter-species CGH protocol. Using the BLAT [22] and BLAST [23] programs, the sequences of the L. lactis microarray probes were aligned with the S. pneumoniae genome sequence, and vice-versa. The BLAT search parameters were 90%, 80% and 70% sequence identity (BLAT90, BLAT80 and BLAT70) and a 100 bp minimum alignment length (owing to the fact that the length of the array probe was between 100 and 400 bp). Available L. garvieae sequences of the nine previously identified genes that were positive in the CGH were aligned with the L. lactis subsp. lactis IL1403 or S. pneumoniae TIGR4 genomes and with the sequences of the immobilized probes of these genes in the corresponding microarray using BLAST [23] and BLAST 2 sequences [24] programs.

Inter-species comparison framework
In silico analyses were performed to compare the sequences of the immobilized probes in the microarray of each reference organism with the sequences of their complete genomes available in GenBank (L. lactis subsp. lactis IL1403: NC_002662 and S. pneumoniae TIGR4: NC_003028). The BLAT alignment of the L. lactis IL1403 probes on the S. pneumoniae TIGR4 genome allowed the identification of 1 ORF with BLAT90, 65 ORFs with BLAT80 and 159 ORFs with BLAT70. Moreover, the BLAT alignment of the probes represented on the S. pneumoniae microarray on the L. lactis genome demonstrated 1 ORF, 63 ORFs and 165 ORFs for BLAT90, BLAT80 and BLAT70, respectively.
The CGH experiments based on swapping off the microarrays between S. pneumoniae and L. lactis identified 65 common ORFs. To evaluate the accuracy of the microarray CGH experiments, we compared these results with those of the in silico analysis. Out of the 65 genes, 47 (72%) showed similarities greater than 80%, 16 genes (25%) exhibited a similarity between 70% and 80%, and only 2 genes (3%) showed a similarity slightly lower than 70% (66-68%) ( Table 1). In summary, 97% of the genes detected by CGH showed similarities greater than 70% at the nucleotide level.
After combined analysis of the results obtained in silico and in vitro, we established, under the hybridization conditions used in this study, a detection threshold based on a sequence similarity of ≥ 70% for alignments longer than 100 bp. This was established as the reference framework for the inter-species CGH assays.
In vitro microarray CGH experiments with L. garvieae CECT 4531 vs reference microorganisms L. lactis subsp. lactis IL1403 and S. pneumoniae TIGR4, and in silico analysis of available sequences from L. garvieae The microarray CGH experiments identified 267 genes in L. garvieae that had analogues in L. lactis and/or S. pneumoniae (Additional file 1). Of these, 111 genes (41.6%) were identified only with the L. lactis microarray, 70 genes (26.2%) only with the microarray of S. pneumoniae, and 86 genes (32.2%) were identified with both microarrays. These genes belong to diverse functional groups ( Table 2). Most of the genes (96.6%) have been documented for the first time in L. garvieae. Only nine genes (four present in both reference microorganisms: atpD/SP1508, pfk/SP0896, tig/SP0400, tuf/ SP1489; three present in L. lactis: als, ddl, galK; two present in S. pneumoniae: SP0766, SP1219) out of the 267 genes detected have been either identified or sequenced before in diverse strains of L. garvieae (Tables 3 and 4). In silico analysis of these previously sequenced genes (n = 9) of L. garvieae were performed to assess the efficacy of the methodology. Alignments of these available sequences with the genomes of the corresponding reference microorganism and their respective array probes showed nucleotide identities ranging between 70% and 86% (Tables 3 and 4). Most of the available sequences (80%) showed similarities greater than 75%.

Discussion
In the present study, commercial microarrays of L. lactis subsp. lactis IL1403 and S. pneumoniae TIGR4 were used to determine the presence of homologous genes in L. garvieae. Both L. lactis and S. pneumoniae were chosen as reference organisms because they are closely related to L. garvieae [18,19] and their genomes have been fully sequenced. Although these CGH experiments cannot detect and identify genes that are likely to exist only in the target microorganism, this approach reveals genes that are common to both the reference and the target organisms, allowing the identification of a large number of genes potentially present in an organism without the need for sequencing genomes [17,25].
In experiments that involve inter-species comparison it is necessary to establish a framework that allows accurate comparison and interpretation of the results. Thus, the first efforts were focused on establishing that framework by the combination and integration of in silico analyses and in vitro microarray CGH experiments to compare the reference organisms L. lactis subsp. lactis IL1403 and S. pneumoniae TIGR4. Signal intensity has been used to assess the level of similarity between two genes in inter-species CGH experiments [15]. However, Table 1 In silico analysis of the common genes detected by CGH in the reference microorganisms  this approach may be influenced, and therefore biased, by different factors, such as regional sample labelling effects, probe accessibility or local hybridization issues [13]. For these reasons, in the present study signal intensity was not considered for determining whether a gene was positive or not in the inter-species CGH experiments. These analyses revealed that nearly all the genes common to L. lactis and S. pneumoniae that were detected by swap microarray CGH experiments (97%) exhibited a sequence similarity of at least 70% (Table 1). Only two genes (dnaG and yciA) detected in the microarray CGH experiments showed a sequence similarity slightly lower than 70% (66 and 68%, respectively; Table 1). Variability in the factors that influence the CGH signals, such as systematic errors (e.g. dye effects), copy number variation, and sequence divergence between the analysed samples [13], may explain these results. The comparison of the results of both analyses, in silico and in vitro, for the reference microorganisms (Table 1) allowed us to establish that, under our experimental conditions, it was possible to detect and identify inter-species hybridization with a detection threshold based on a sequence similarity of ≥70%.
Therefore, our threshold value of sequence similarity ≥70% was set up directly from the comparison of the results of the in silico and in vitro analyses of the present study. This threshold value was used subsequently to interpret the results of the microarray-based CGH experiments comparing L. garvieae and the reference microorganisms. Less stringent hybridization conditions would probably have allowed the identification of a larger number of genes, but this would have also resulted in lower specificity. Given that the final aim of the Amino acid transport and metabolism 14 10 Carbohidrate transport and metabolism 24 15 Cell cycle control, cell division, cromosome partitioning 4 2 Cell wall/membrane/envelope biogenesis 5 4 Coenzime transport and metabolism 1 1 DNA replication, recombination and repair 8 12 Energy production and conversion 11 6 Inorganic ion transport and metabolism 4 5 Intracellular trafficking, secretion and vesicular transport 4 2 Lipid transport and metabolism 2 0 Nucleotide transport and metabolism 15 11 Phage capside proteins 1 0 Post translational modification, protein turnover, chaperones 8 8 Signal transduction mechanisms 2 3

Transcription 7 6
Translation, ribosomal structure and biogenesis 64 60 Unknown function 23 11 Total 197 156 Table 3 In silico analysis of the available sequences of the genes detected in L. garvieae by CGH Results for the L. lactis subsp. lactis IL1403 array based-CGH experiment was the identification of genes potentially present in L garvieae, it was preferred to maintain stringent hybridization conditions, therefore increasing the specificity and the reliability of the results. Hence, the genes detected in the CGH experiments should have an analogue in L. garvieae with a nucleotide sequence identity greater than 70% with the respective gene in the reference organism. The CGH hybridizations using L. lactis subsp. lactis IL1403 and S. pneumoniae TIGR4 microarrays identified 267 analogous genes in L. garvieae (Additional file 1). Only 3.4% of these genes (nine out of 267) have been characterized or sequenced previously by other groups in different strains of L. garvieae [ [18,[26][27][28][29], and GenBank sequences: AX109994, AB364624, AB364625, AB364626, AB364627, AB364632, AB364633, AB364637, AB364638, AB364639, AB364640, AB364641, EU153555]. The alignments of the available sequences of these nine previously of these nine previously identified genes in L. garvieae with both the sequences of these genes from the reference microorganisms and those from the array probe showed nucleotide similarities greater than 70% (70-86%) between them (Tables 3 and 4). These data are consistent with the detection threshold value discussed previously. Therefore it is reasonable to assume that the other genes detected in L. garvieae CECT 4531 by CGH experiments will also have at least 70% sequence similarity with the respective genes in the reference microorganisms. The positive result obtained in both CGH experiments for the tig/SP0400 gene (Tables 3 and 4), was unexpected given the absence of similarity between the available sequence and the probes on both microarrays. This result could be explained by the fact that the available sequence for L. garvieae is partial, and it represents a part of the gene that does not correspond with the probe.
We classified the ORFs into clusters of orthologous genes (COGs) [30]. The 267 genes identified in L. garvieae CECT 4531 (Additional file 1) belong to diverse biological functional groups ( Table 2). Most of the genes detected in L. garvieae (about 66%) were related to meaningful biological functions such as those related to ribosomal functions, sugar metabolism or energy conversion systems, which are usually represented in Lactobacillales [31]. The remaining genes identified included "housekeeping genes", such as gyrB, sodA, recA, ileS, rpoD, dnaK and ddl [19], genes of diverse functional groups and genes with unknown functions. Some of them are of interest because they could be involved in the pathogenesis of L. garvieae infections. For example, the gene als, which has been described as an important factor for host colonization by El Tor biotypes of Vibrio cholerae [32], has also been suggested to be one of the genes required for survival of L. garvieae in fish [27]. In addition, the gene mycA, which was detected for the first time in L. garvieae in the present study, encodes an antigen that cross-reacts with myosin, and members of Results for the S. pneumoniae TIGR4 array based-CGH this family of proteins have been suggested to play an important role in the pathogenesis of streptococcal infections [33]. Sequencing of the genes identified in this work is beyond the scope of this initial study, but the data provided can be the starting point for future genetic analysis of L. garvieae strains from different ecological niches or adapted to different host species.
This study provides the first insight into the genome content of L. garvieae and suggests that CHG could be a useful approach for studying the genetic content of other Gram-positive catalase-negative cocci of human and veterinary relevance.

Conclusions
In the present work, a comparative analysis based on microarray interspecies hybridization and on the use of bioinformatic tools was used for the first time to study the genetic content of L. garvieae CECT 4531. It is important to remark that the integration of results from bioinformatics and microarray-based CGH requires the definition of a framework that allows an accurate comparison and interpretation of the results obtained. Once this framework was established, it was possible to identify 267 genes potentially present in L. garvieae CECT 4531. Some of the identified genes, such as the als and mycA genes, could be involved in the pathogenesis of L. garvieae infections.
In summary, these results provide the first insight into the genome content of L. garvieae and could be useful for future understanding of the genetics of this pathogenic microorganism.
Additional file 1: Genes potentially identified in L. garvieae CECT 4531 and their homologues in L. lactis subsp. lactis IL1403 and S. pneumoniae TIGR4.