High-quality genome sequence assembly of R.A73 Enterococcus faecium isolated from freshwater fish mucus

Whole-genome sequencing using high throughput technologies has revolutionized and speeded up the scientific investigation of bacterial genetics, biochemistry, and molecular biology. Lactic acid bacteria (LABs) have been extensively used in fermentation and more recently as probiotics in food products that promote health. Genome sequencing and functional genomics investigations of LABs varieties provide rapid and important information about their diversity and their evolution, revealing a significant molecular basis. This study investigated the whole genome sequences of the Enterococcus faecium strain (HG937697), isolated from the mucus of freshwater fish in Tunisian dams. Genomic DNA was extracted using the Quick-GDNA kit and sequenced using the Illumina HiSeq2500 system. Sequences quality assessment was performed using FastQC software. The complete genome annotation was carried out with the Rapid Annotation using Subsystem Technology (RAST) web server then NCBI PGAAP. The Enterococcus faecium R.A73 assembled in 28 contigs consisting of 2,935,283 bps. The genome annotation revealed 2884 genes in total including 2834 coding sequences and 50 RNAs containing 3 rRNAs (one rRNA 16 s, one rRNA 23 s and one rRNA 5 s) and 47 tRNAs. Twenty-two genes implicated in bacteriocin production are identified within the Enterococcus faecium R.A73 strain. Data obtained provide insights to further investigate the effective strategy for testing this Enterococcus faecium R.A73 strain in the industrial manufacturing process. Studying their metabolism with bioinformatics tools represents the future challenge and contribution to improving the utilization of the multi-purpose bacteria in food.


Background
Antibiotic and chemotherapeutic drug use in aquaculture are an important disease control measure in the aquaculture industry [1]. However, antimicrobial use may promote drug-resistant microorganisms emerging and antibiotic residues detection in fish and in the environment [2].
Probiotic LABs are widely used, as an alternative to antibiotics uses, to prevent animal and human bacterial infections [3]. Enterococcus is a LABs large genus, ubiquitous, having the capacity to adapt challenging environments. Such species are isolated from different habitats including water (i.e. waste, freshwater, and seawater), soil, plants, and the digestive tract of warm-blooded animals and/or humans [4]. Several studies have demonstrated Enterococcus faecium beneficial effects as probiotic in humans, animals, and aquatic culture [5][6][7][8][9][10].
Strains belonging to the genus Enterococcus produce a wide variety of bacteriocins often called enterocins. They have antagonistic properties against a wide range of pathogenic bacteria [11].
This genus of bacteria produces a wide variety of bacteriocins, which are considered to be biological control agents in food, maintaining their organoleptic and nutritional properties. They thus constitute an alternative to the use of chemical additives or physico-chemical treatments used in food industry [12]. In addition, bacteriocins have the advantage of being rapidly digested by proteases in the human digestive tract [13] without producing toxic secondary substances. Bacteriocins can also find applications in the medical sector [14], they can be used as antimicrobial agents in the pharmaceutical industry (Folli et al., 2003). Enterocins (bacteriocins of enterococci) are of bacteriological importance because of their ability to inhibit the growth of members of the genera Listeria, Clostridium, and Staphylococcus responsible of the highest mortality rate (20-30%) compared to other foodborne pathogens [15][16][17].
Several studies have refined the knowledge on the genomic diversity of probiotic Enterococcus strains to elucidate their genomic features responsable for survival in GI tract, antibiotic resistance, virulence factors and the genetic divergence between pathogenic and probiotic Enterococcus strains [18][19][20][21]. Some knowledge has been acquired on LABs metabolic activities include carbohydrate, protein and lipid metabolisms, and other metabolic activities. LAB needs amino acids and peptides to respond to their nitrogen complex [22]. Amino acids and peptides may be obtained through proteases or proteolysis actions. In such actions, peptides are metabolized to free amino acids and other compounds for further use. Due to the requirements of peptide differences, peptides can either be essential growth promoters or stimulating factors, some strains can grow up independently.
Recently, the preselected Enterococcus faecium R. A73 strain isolated from freshwater fish mucus, has proven to have specific probiotic properties [3]. In the current study, the whole-genome sequencing of Enterococcus faecium R.A73 strain was performed and investigate the genome contents and gene functions through comparison to related species. All together, results support the findings of the previous study.

Results
E. faecium R.A73 genome annotation Genome content The genome of Enterococcus faecium R.A73 strain, isolated from Tilapia Oreochromis niloticus mucus, has been sequenced using the Illumina HiSeq 2500 system. The present draft genome includes 2,935,283 bases, with a GC content of 38.0%, and was assembled into 28 scaffolds. The Genomic annotations illustrated a total number of 2884 genes, corresponding to 2834 coding sequences (CDSs) and 50 RNAs with single predicted copies of the 16S, 23S, and 5S rRNA genes and 47 predicted tRNAs ( Fig. 1). A total of 342 RAST genome sub-systems were identified, with many features of carbohydrates subsystem (Fig. 2), including the genes involved in the metabolism of central carbohydrate, amino sugars, di-and oligosaccharides, the carbon metabolism, organic acids, the fermentation metabolism, sugar alcohols, polysaccharides, and monosaccharides. There are also many amino acids and derivative characteristics of the sub-system, including the lysine, threonine, methionine, and cysteine.

Functional annotation
A total of 2063 protein-coding genes (72.58% of the total protein-coding genes) were assigned a putative function by Clusters of Orthologous Groups (COGs). Genes associated with carbohydrate transport and metabolism (294 Open Reading Frames (ORFs)), translation (206 ORFs), and transcription (205 ORFs) were ranked among the most abundant COG functional categories. The genes distribution into COG functional categories is summarized in (Fig. 2).

Phylogeny and classification
Based on rDNA 16S sequences, the phylogenetic tree showed that the R.A73 strain is more similar to E. faecium LMG 11423 and E. durans NBRC 100479 than other Enterococcus species (Fig. 3).
Moreover, a Genome-to-Genome Distance Calculator (GGDC) was performed for genome-to-genome comparison between R.173 and related strains. DNA-DNA hybridization is considered as the best indicator for distinguishing species. The probabilities of DDH value higher than 70% detected through logistic regression under three formulae indicate that E. faecium R.A73 is different from other species of the genus excepting Enterococcus faecium. A DDH value > 96% was found following the comparison against E. faecium T110 (Table S1). The later analysis combined to the rDNA 16S based phylogeny method confirmed its identification as E. faecium species.

Comparative genomics Comparative analysis of genome sequences
The comparative genomics help to understand several aspects related to the pathogenicity, the resistance to antibiotics, and probiotic characteristics.
The comparative proteome among enterococcus genomes (Table 1) showed a high similarity between E. faecium HG937697 and E. faecium T110 genomes with 2,318 common orthologs genes (80.37%). This similarity was confirmed using The BRIG tool (Fig. 4). Specific protein-coding genes (208) were identified in E. faecium R.A73 strain.

Antibiotics resistance
Two genes involved in resistance to antibiotics and toxic compounds were identified. These genes correspond to an homolog of aac (6′)-Ii involved in Aminoglycoside resistance (% identity: 98.36; Query/HSP length: 549/549; Accession number: L12710) and a homolog to msr(C) involved in MLS -Macrolide, Lincosamide and Streptogramin B (% identity: 97.70; Query/HSP length: 1479/ 1479; Accession number: AF313494). Besides, PGAAP and RAST annotation systems were also able to detect 52 other genes potentially involved in virulence, disease, and defense mechanisms. These genes found in the HG937697 genome are presented in (Table 2).

Discussion
A genomics study was performed in a preselected Enterococcus faecium R.A73 strain, isolated from freshwater fish mucus, displaying potential probiotic characteristics and significant efficiency as food additives. The complete genome annotation revealed that the bacteria R.A73 genome did not have any plasmid which may be due to growing temperature, copies number, or even isolation methods [26].
Several carbohydrate subsystem features were identified in Enterococcus faecium R.A73 strain genome. It has been proven that carbohydrates degradation and their related compounds are mainly responsible for the primary metabolic activity of LAB, generating energy and carbon source molecules [27,28]. The genome annotation for the strain under study suggests an abundance of metabolic activities such as proteins, lipids, and other compounds decomposition, which are important for LAB growth. Interestingly, many amino acids and derivatives characteristic of the subsystem, including lysine, threonine, methionine, and cysteine, are found in the genoma of Enterococcus faecium R.A73 strain. LAB amino acid requirements are strain-dependent with a large range of species differences [29,30]. Enterococcus faecium have the ability to use a wide range of mono-, di-, oligo-saccharides and therefore they have an enriched carbohydrate metabolism [4,31] as well as using a variety of carbohydrates has been shown to be among properties associated to probiotic strains [32]. Furthermore, 51 genes out of 208 were assigned to COG functional categories associated with carbohydrate transport and metabolism (6 genes), amino acid transport and metabolism (6 genes), and cell wall/membrane/envelope biogenesis (5 genes).
The presence of the prophage in the genome of E. faecium R.A73 strain was predictable. Bacteriophages contribute to the evolution of bacteria through their integration into the genome, E. faecium bacteria are known to harbour bacteriophages [33].
Protein-coding for ABC transporters have been detected, they are known to have an antibacterial activity that may contribute to probiotic potential in such strains [34].
Enterococcus faecium R.A73 strain genome identified 22 genes involved in bacteriocin production as well as antimicrobial peptides. The gene involved in colicin V (Col V) has been identified. Col V is an antibiotic-like peptide that kills susceptible cells by disrupting their potential membrane once it reaches the periplasmic inner membrane. It is secreted by some members of enterobacteria to kill closely related bacterial cells, thus reducing competition for essential nutrients [35,36].  3 Phylogenetic tree based on 16S rDNA sequences. 16S rDNA sequences were downloaded from the National Center for Biotechnology Information (NCBI) database and aligned using Muscle [23] as part of the MEGA7 [24] software to generate 1000 bootstrap replicates followed by a search for the best-scoring Maximum Likelihood (ML) tree. The tree was saved in Newick format and displayed, manipulated, and annotated using iTOL 3 [25] This protein was shared by several Enteroccus strains including Enterococcus faecium DO (WP_002295088.1).
Comparative proteome analysis showed that R.A73 strain was closely related to the probiotic strain T110 (Fig. 4). This latter is a commercially probiotic widely prescribed for humans, animals, and aquaculture [8]. It is a content of many commercial available probiotics and no cause of illness or death has been reported [8].
The R.A73 strain may be categorized as antimicrobial resistance (AMR) because in previous study [3] it was found to be resistant to several antibiotics (oxacillin, streptomycin, cefazolin and clindamycin). However, Enterococcus may acquire resistance to some antibiotics via the presence of intrinsic genes related to their innate resistance as well as through horizontal genes transfer [41,42]. The latter mechanism can lead as well as the ability to aquire certain adaptive genetic traits, such as (AMR) determinants [43]. In Japan, Enterococcus strains used as probiotics have shown resistance to tetracyclines and betalactams [44].
Previous study has investigated the probiotic properties of Enterococcus strains isolated from artisanal dairy products [45]. The most important virulence factors investigated include cylA, cylB and cylM, esp., agg, gelE, cpd, ccf, and cad genes. These later are responsible for the cytosilin transportation and activation, application in modification of post-translational proteins, immune evasion, adherence to eukaryotic cells, the production of toxin which hydrolyzes gelatin, and finally sex pheromones which are responsible for facilitating conjugation [6,46]. No genes belonging to the aforementioned list was found in R.A73. The same study showed that probiotics investigated strains demonstrated hydrophobicity activity, auto-aggregation, and adhesion ability to the human intestinal cell line contributing to the gut colonization. Indeed, some of the main selection criteria for potential probiotics is their ability to adhere to the gastrointestinal tract in order to exert their probiotic effects for an extended time [47]. However, adhesion is as well considered as a potential virulence factor for pathogenic bacteria [48]. Therefore, ebpA, ebpB and ebpC are classified as virulence determinants but they are in fact adherence factors. Ebp genes may play a role during colonization of the mammalian host, adherence to abiotic surfaces, or bacterial surface components [49].

Conclusion
Marine microbiology fields are still evolving and significant progress can be expected on marine pollution issues including bacterial oil degradation, which is under investigation at present. The current results respond to potential probiotic properties. Enterococcus faecium R.A73 strain can be safely used as bio-ingredients in conservation and fish processing consumed by humans and animals. However, further studies are needed for comprehensive identification of AMR genes in the probiotic strains.

Bacterial strain
In total, 177 LABs have been isolated from different organs (intestine, skin, gills and mucus) in freshwater fish (Mugil cephalis and Oreochromis niloticus). Within this collection, the novel R.A73, isolated from Tilapia Oreochromis niloticus mucus, was identified as Enterococcus faecium, exhibited high inhibitory activities against food-borne pathogens and spoilage microbial species and has significant in vitro probiotic profiles [3].

Growth conditions and DNA preparation/isolation
Enterococcus faecium R.A73 was inoculated in De Man-Rogosa-Sharpe (MRS) broth for 48 h at 20°C. Pure genomic DNA was then extracted using the Quick-GDNA

Genome sequencing
Enterococcus faecium R.A73 strain genome has been sequenced using the Illumina HiSeq 2500 system. FAST Q paired-end sequence data files have been generated using the Illumina CASAVA pipeline version 1.8.3. Initial quality assessment was based on the data that passed Illumina chastity filtering. Readings with adapters and/or the PhiX control signal were then deleted. The second assessment of quality based on the remaining reads was performed using the FASTQC quality control tool version 0.10.0. FASTQ sequence quality has been enhanced by removing the low-quality bases, with the" Trim Sequences" options from CLC Genomics Version 7.0.4.

De novo assembly
The quality-filtered sequence reads were assembled in some contig sequences. The analysis was carried out by using the option "De novo Assembly" in the genomics workshop CLC version 7.0.4. The optimal k-mer size was automatically determined using KmerGenie [50]. Contigs were then linked to each other's and put into scaffolds or supercontigs. The orientation, order, or distance between the contigs was estimated by using the insert size between the paired-end. The scaffolding has been performed using the SSPACE Premium scaffolder version 2.3 [51]. Gapped regions within the scaffolds were partially closed in an automated manner using GapFiller version 1.10 [52]. The method takes advantage of the insert size between the paired-end reads.

Genome annotation
The RAST web server was used [53] to perform genome annotation. Briefly, protein-coding genes were predicted using the Classic RAST annotation scheme [53]. RNAmmer tool [54] was used to predict ribosomal RNAs, while tRNAs can-SE [55] was used to detect transfer RNAs. The NCBI Prokaryotic Genomes Automatic Annotation Pipeline (PGAAP) (https://www.ncbi.nlm.nih.gov/genome/an notation_prok/) was used to perform a final annotation.

Functional annotation
Clusters of Orthologous Group were assigned based on comparative proteomes analysis against the COG database [56] using protein sequences that have previously been predicted by PGAAP. Briefly, using the best reciprocal hits approach with an e-value <= 1E-05, protein sequences were retrieved and compared against the protein sequences available in the COG database.

Phylogenetic analysis and genome-to-genome distance calculation
Identification of closely related strains to E. faecium R.A73 was performed based on Basic Local Alignment Search Tool (BLAST) searches and pairwise global sequence alignments through the well-curated EzTaxon database; which covers not only type strains of prokaryotic species with validly published names but also phylotypes that may represent species in nature. The 16S rDNA gene sequences with pairwise similarity higher than 96% to E. faecium R.A73 (locus_tag = "DTX73_ 13310") were chosen for phylogenetic tree construction. 16S rDNA sequences were downloaded from the National Center for Biotechnology Information (NCBI) database. They were aligned using Muscle [23] as part of the MEGA7 [24] software to generate 1000 bootstrap replicates followed by a search for the best-scoring Maximum Likelihood (ML) tree. This latter was displayed, manipulated, and annotated using iTOL 3 [25]. Digital DDH similarities between the E. faecium R.A73 genome and those of other Enterococcus species were calculated using the GGDC web server version 2.0 under the recommended setting [57].

Comparative genomics
Genome comparison of E. faecium HG937697 strain with related species was performed using BRIG (Blast Ring Image Generator), an open-source multi-platform software application, which displays multi-genome comparisons and similarity between the reference genome at the center of one image compared to other related genomes listed in (Table 1), in the form of a concentric colored ring set according to BLAST identity [58].
Furthermore, protein sequences of E. faecium R.A73 strain that were predicted by RAST and PGAAP annotation system were extracted and compared to protein sequences of the proteomes of related Enterococcus cited in (Table 1). The comparison was computed using Inparanoid (http:// InParanoid.sbc.su.se) [59] then MultiParanoid (http://multi paranoid.cgb.ki.se/) [60] Perl programs to identify the cluster of orthologous genes between pairs of species than between all the species, respectively.

Bacteriocin genes identification
Gene annotaion performed with PGAAP and RAST server annotation [61] allowed to identify genes encoding for bacteriocins and related products in the E. faecium R.A73 strain. The comparison of protein sequences between related probiotic enterococcus strains led to the identification of bacteriocins orthologuous proteins in R.A73 strain. Furthemore, R.A73 protein sequences were compared to all bacteriocins protein sequences available in Bactibase database (http://bactibase.hammamilab.org/ bacteriocinslist.php?view=GeneralView) [62].