Comparative genomics of multidrug-resistant Enterococcus spp. isolated from wastewater treatment plants

Background Wastewater treatment plants (WWTPs) are considered hotspots for the environmental dissemination of antimicrobial resistance (AMR) determinants. Vancomycin-Resistant Enterococcus (VRE) are candidates for gauging the degree of AMR bacteria in wastewater. Enterococcus faecalis and Enterococcus faecium are recognized indicators of fecal contamination in water. Comparative genomics of enterococci isolated from conventional activated sludge (CAS) and biological aerated filter (BAF) WWTPs was conducted. Results VRE isolates, including E. faecalis (n = 24), E. faecium (n = 11), E. casseliflavus (n = 2) and E. gallinarum (n = 2) were selected for sequencing based on WWTP source, species and AMR phenotype. The pangenomes of E. faecium and E. faecalis were both open. The genomic fraction related to the mobilome was positively correlated with genome size in E. faecium (p < 0.001) and E. faecalis (p < 0.001) and with the number of AMR genes in E. faecium (p = 0.005). Genes conferring vancomycin resistance, including vanA and vanM (E. faecium), vanG (E. faecalis), and vanC (E. casseliflavus/E. gallinarum), were detected in 20 genomes. The most prominent functional AMR genes were efflux pumps and transporters. A minimum of 16, 6, 5 and 3 virulence genes were detected in E. faecium, E. faecalis, E. casseliflavus and E. gallinarum, respectively. Virulence genes were more common in E. faecalis and E. faecium, than E. casseliflavus and E. gallinarum. A number of mobile genetic elements were shared among species. Functional CRISPR/Cas arrays were detected in 13 E. faecalis genomes, with all but one also containing a prophage. The lack of a functional CRISPR/Cas arrays was associated with multi-drug resistance in E. faecium. Phylogenetic analysis demonstrated differential clustering of isolates based on original source but not WWTP. Genes related to phage and CRISPR/Cas arrays could potentially serve as environmental biomarkers. Conclusions There was no discernible difference between enterococcal genomes from the CAS and BAF WWTPs. E. faecalis and E. faecium have smaller genomes and harbor more virulence, AMR, and mobile genetic elements than other Enterococcus spp.


Background
Enterococci are ubiquitous in nature and can be found in a variety of environments, including soil, plants, surface water, wastewater, food, and the gastrointestinal tract of animals and humans [43,60]. Enterococcus faecalis and Enterococcus faecium, are associated with a variety of clinical infections of the urinary tract, heart, surgical wounds, bloodstream and neonates [67] as well as indicators of fecal contamination [10]. The ability to treat infections caused by Enterococcus spp. is hindered by the development and spread of antimicrobial resistance (AMR) [1]. Resistance to antimicrobials of last resort, such as vancomycin, impairs the control of enterococcal infections and is usually accompanied by resistance to other antimicrobials [24,32].
Enterococci and antimicrobials are excreted in urine and feces, and in urbanized developed nations, most of this waste is transported to and treated in wastewater treatment plants (WWTPs) prior to discharge into surface waters. WWTPs could be considered points of control for the environmental dissemination of AMR and ideal environments to investigate the epidemiology of AMR from a "One Health" perspective [2,44,57]. Within this environment, enterococci can not only exchange genes coding for AMR, but also for heavy metal resistance as well as other genes that increase persistence and survival in other environments [3]. This outcome can facilitate the broader dissemination of AMR genes [2]. Comparative genomics has been applied to identify genes responsible for virulence, AMR, metabolism, secondary metabolite production and gene mobility. Comparative genomics can also be used to compare genes from other functional categories, to predict the ecological fitness of strains, and to discern evolutionary relationships among species.
We previously isolated a number of species of enterococci from two WWTPs with different treatment processes, a conventional activated sludge (CAS) and a biological aerated filter (BAF) system, with E. faecalis being the dominant species identified [61]. This work demonstrated changes in AMR phenotypes between wastewater enterococci before and after treatment and between WWTPs. In the current study, we selected 39 wastewater enterococci for sequencing out of 1111 enterococci isolated, including 308 that exhibited vancomycin resistance in broth culture. Isolates were selected so as to be representative of before and after treatment in both WWTPs [61]. We hypothesized that the genomes would not cluster by treatment process but genomes from the BAF system may contain more biofilmrelated genes than those from the CAS system. We also proposed that there would be more virulence, AMR, and genetic mobility genes in E. faecalis and E. faecium than other Enterococcus spp. and that the larger genomes in these clinically relevant species would correlate with the number of mobile genetic elements and genes conferring fitness for survival in a broader range of environments.
The range in contigs generated during sequencing was greater in E. faecium  contigs) than in other species (11-68 contigs), likely due to the presence of repetitive and insertion genetic elements complicating assembly [54]. Genome sizes were greater for vancomycin and multi-drug resistant strains of E. faecium (3.04 Mbp) than for susceptible strains (2.60 Mbp). The genome size of vancomycin-resistant and multi-drug resistant E. faecalis was similar to their susceptible counterparts.

Multi-locus sequence typing
In the current study, 4 sequence types (STs) for E. faecium and 15 STs for E. faecalis were identified (Table 1). Eight E. faecium genomes belonged to ST18, part of the clonal complex 17 (CC-17). Out of the E. faecalis STs identified in this study, ST16 (n = 7) and ST40 (n = 4) were the most common.

Phenotypic antimicrobial resistance profiles
Sequenced enterococci exhibited a number of phenotypic antimicrobial resistant profiles, with some isolates being resistant to as many as seven antimicrobials (Table 2). VAN R , TEC R , AMP R , ERY R were among the most common resistant phenotypes found in enterococci.

Phylogeny
Genomes did not cluster based on WWTP, but all species formed separate monophylogenetic groups (Fig. 2). The majority of wastewater E. faecalis isolates were more closely related to livestock and food-derived E. faecalis genomes, while seven wastewater strains (B139, B168, C34, W37, W75, W191, and W314) clustered with strains isolated from human infections (Fig. 3). None of the E. faecalis wastewater, human, and agriculture (and food-derived) isolates clustered together by source, suggesting that agricultural and human clinical strains are phylogenetically distinct. Vancomycin-resistant E. faecalis isolates also did not cluster as they belonged to different STs, unlike vancomycin-resistant E. faecium, which did cluster as all isolates belonged to CC-17 (Fig. 4). For Clusters of Orthologous Groups (COGs) are broad functional categories used to assign proteins to their specific function [69]. Functional categorization of proteins into different COGs revealed variation profiles among Enterococcus spp., but little difference among strains within species, with the exception of the mobilome and genes associated with energy production and conversion (Additional file 1, sheet 6). We assessed which functional categories of genes were disproportionately represented in the isolates collected from the WWTPs with expanded genomes. Given the variation in genome size between and within species, the relationships between genome size and the number of genes associated with specific functional categories was determined ( Fig. 5; Additional file 1, Sheet 6). There were more COGs assigned to carbohydrate transport and metabolism, transcription, cell motility, secondary metabolite biosynthesis, transport, catabolism and signal transduction mechanisms in E. casseliflavus and E. gallinarum compared to a b Fig. 1 Enterococcus faecalis (a) and Enterococcus faecium (b) pan-genome illustrated as a matrix with the core SNP tree of the strains on the left and a presence (blue) and absence (white) matrix of core and accessory genes enterococci more frequently associated with clinical infections.
When all of the wastewater Enterococcus genomes were pooled, there was a strong negative correlation (p < 0.001) between genome size and nucleotide transport and metabolism, lipid metabolism and translation, ribosomal structure and biogenesis and a strong positive correlation (p < 0.001) between genome size and cell motility (Fig. 5 a; Additional file 1, sheet 6). The total number of genes related to cell motility, signal transduction, and carbohydrate transport and metabolism were positively correlated (p < 0.001) with genome size. This is reflective of the greater genome size of environmental species compared to E. faecium and E. faecalis. The total number of genes related to cell division and chromosome partitioning, cell envelope biogenesis, outer membrane and post translational modification, protein turnover, and transcription were negatively correlated (p < 0.001) with genome size.
The species-specific patterns in genomic proportions for each functional category differ from the pooled genomes for the genus. In both E. faecalis and E. faecium, a larger genome was strongly correlated with the mobilome (p < 0.001) (Fig. 5 b and c), a functional category not included in the analysis of Konstantinidis and Tiedje [34]. In contrast, the mobilome was not correlated with genome size in the pooled Enterococcus genomes. There was also a positive correlation (p = 0.005) between the number of unique AMR genes and genome size of E. faecium, suggesting the acquisition of AMR genes occurs through horizontal gene transfer. For example, E. faecium R337 had a genome of 3.02 kbp, 58 genes associated with the mobilome and 23 AMR genes; while E. faecium C329 had a genome of 2.48kbp and 15 genes associated with the mobilome and 3 AMR genes.
The total number of genes related to cell motility (p < 0.001), DNA replication, recombination, and repair (p < 0.001), extracellular structures (p < 0.001), and mobilome (p < 0.001) was positively correlated with genome size in E. faecium. The number of AMR genes also showed a positive correlation (p = 0.002) with the amount of genes related to the mobilome in this species (Fig. 5 c). The eight E. faecium genomes belonged to the same sequence type (CC-17), while E. faecalis genomes were more diverse.

Antimicrobial resistance genes
In this study, we screened 39 multi-antimicrobial resistant enterococci genomes against the CARD database for antimicrobial resistance genes (ARGs) (Additional file 1, Sheet 8) and ten genes (eatAv, emeA, lsaA, efrA, efrB, tetL, efmA, msrC, ermY, and lsaE) associated with multidrug efflux pumps and other transporters were detected. These efflux proteins may confer intermediate resistance to a variety of antimicrobials.   Genes conferring glycopeptide (vancomycin and teicoplanin) resistance were detected in 20 of the genomes. In E. faecium and E. faecalis, resistance was conferred by vanA and vanM in E. faecium or vanG in E. faecalis. Vancomycin resistance was mediated by vanC, and this was the only ARG detected, in E. casseliflavus and E. gallinarum.
Thirteen of the enterococci isolates were resistant to high concentrations of gentamicin and streptomycin. In our study, cross-resistance to levofloxacin and the aminoglycosides (gentamicin and streptomycin) occurred in 5 isolates with 3 additional isolates exhibiting intermediate resistance to one or more of these antimicrobials. In our study, additional aminoglycoside genes (ant(9′)-Ia, aad(6′), aph(3′)-IIIa, SAT-4, ant(6′)-Ia, and aac(6′)-Ieaph(2″)-Ia) were detected in the genomes of up to 5 E. faecalis and 7 E. faecium aminoglycoside resistant isolates. Gentamicin resistance arises as the result of the acquisition of aac(6′)-Ie-aph(2″)-Ia, which was detected in 7 genomes (2 E. faecalis and 5 E. faecium) and confers resistance to all aminoglycosides except streptomycin [42]. The prevalence of streptomycin resistance versus gentamicin resistance differed between species, with streptomycin resistance being more common in E. faecium and gentamicin resistance more common in E. faecalis.
Genes encoding tetracycline resistance were detected in 26 of the genomes, including E. faecium and E. faecalis. In this study, determinants for macrolide and tetracycline were detected together in 16 of the enterococcal genomes. Genes associated with resistance to antimicrobials not included in the disc susceptibility panel were also detected. A gene associated with chloramphenicol resistance, cat, was detected in two E. faecalis genomes. Genes associated with diaminopyrimidine resistance (dfrE, dfrF, and dfrG) were also detected in E. faecium

Virulence genes
The number of shared virulence genes among genomes of the same species were 16, 6, 5 and 3 for E. faecium, E. casseliflavus, E. faecalis, and E. gallinarum, respectively (Additional file 1, Sheet 9-11). All of the E. faecium isolates contained genes related to adhesion to surfaces (tuf, aga, efaA, and sgrA), cell wall biosynthesis (phosphatase cytidylyltransferase, uppS), cellular defense (lisR), biofilm formation and surface proteins (acm, esp, scm and type A and B pili). Other functions including bile salt degradation (bsh), proteases (tip/ropA), biofilm formation (bopD), enolase (eno), and antiphagocytosis and capsule formation (rfbA-1) were also identified. All of the E. faecalis genomes contained genes for cell adhesion (tuf), carbohydrate metabolism (hyl), endocarditic and biofilm association (ebp) pili (ebpA), Type III secretion proteins (bopD) and fibrinogen-binding proteins (fss1). All of the E. casseliflavus genomes contained the same five virulence genes with functions of: capsule biosynthesis (capE), enolase (eno), leucine aminopeptidase (lap), heat shock protein (hsp60), and protein modification (lplA1). All of the E. gallinarum genomes had an enolase (eno), a flagellar biosynthesis protein (flhA) and a bile salt hydrolase (bsh). One of the E. gallinarum genomes also contained genes related to capsule proteins and another isolated from effluent possessed 2 genes associated with metal transporter (ssaB and psaA) as well Fig. 3 Phylogenetic tree of Entercoccus faecalis genome sequences from the present study and complete genome sequences from the NCBI GenBank database based on analysis of single-nucleotide variants (SNVs) of the core genes. Enterococcus faecalis ATCC29212 was used as the reference genome. Origin of Isolates are as indicated in the figures and are grouped by colour into clinical (red), agricultural/food (green) and wastewater/water (blue) groups as those associated with the CAS system. Hyaluronidase (hyl) genes were detected in all the E. faecalis genomes.

Mobile gene elements
ICE and transposons present in the genomes were identified and described using the ICEberg database (Table 3; Additional file 1, sheet 17). The transposon, Tn917 was identified in 8 of the sequenced E. faecalis genomes. One transposon, Tn6098 was present in all genomes. A multidrug resistance transposon, Tn5385 was also found in all E. faecalis genomes. Other Tn5801 and Tn6013-like ICE elements of unknown function were also present in all E. faecium isolates, in addition to a cadmium and arsenic resistance ICE, ICESde3396. All of the E. gallinarum and E. casseliflavus isolates had Tn916-type transposons (Tn6079, Tn6087 and Tn6084, respectively).
Seven out of the unique 27 ICE were present in genomes of more than one Enterococcus species.

CRISPR-Cas arrays and bacteriophage
Type II CRISPR-Cas systems were detected in 13 E. faecalis genomes (Fig. 6). Orphan CRISPR arrays (without Cas genes) were identified in 27 of the genomes (Fig. 6). Comparison of CRISPR arrays flanked by Cas genes revealed unique arrays among Enterococcus species, but some arrays were shared among strains of the same species. Arrays identified in the sequenced Enterococcus genomes contained 4 to 20 direct repeat sequences associated with functional CRISPR arrays. An additional 72 unique spacers associated with orphan CRISPR arrays were identified in this study. Eleven E. faecalis and 10 E. faecium genomes lacked CRISPR-Cas systems. Any genomes Fig. 4 Phylogenetic tree of Entercoccus faecium genome sequences from the present study and genome sequences from the NCBI GenBank database based on analysis of single-nucleotide variants (SNVs) of the core genes. Entercoccus faecium DO served as the reference genome. Origin of isolates are as indicated in the figures and are grouped by colour into clinical (red), agricultural/food (green) and wastewater/water (blue) groups lacking functional arrays exhibited resistance to 4 or more antimicrobial agents.
Functional CRISPR arrays and intact prophage were identified in 10 E. faecalis genomes, but the combination was not seen in the other 29 genome sequenced in this study. Some of the spacer regions identified in CRISPR arrays were 100% identical to incomplete prophage sequences, but these genomes still contained at least one prophage.
Bacteriophage-mediated transduction of AMR has been demonstrated in enterococci and potential virulence determinants have been identified in phage associated with E. faecalis. Phages found in the genomes were members of the Siphoviridae and Myoviridae (Additional file 1, Sheet 12). Thirty-four of the 39 genomes contained at least one putative phage ranging in size from 19.2 kb to 70.6 kb. A total of 55 unique intact prophages were identified across 34 sequenced genomes. E. faecium and E. faecalis contained up to 3 intact prophages, whereas E. casseliflavus and E. gallinarum contained 1 or 2 intact prophages.

Secondary metabolites
Bacteriocins were identified in 8 E. faecalis and 9 E. faecium genomes in addition to 1 E. gallinarum genome (Additional file 1, Sheet 18). Enterocin A was identified in nine E. faecium genomes. Lantipeptides were identified in 3 E. faecalis genomes as cytolysins, which have both haemolytic and bacteriolytic activities [12]. Lassopeptides were identified in 6 E. faecalis genomes. Terpenes were detected in all E. casseliflavus and E. gallinarum, but not in E. faecalis or E. faecium genomes. Aryl polyene was detected in one E. faecalis (C34) genome.

Biomarker search
The small number of genomes limited the identification of biomarkers, particularly for searches within the same species isolated from different sources (Additional file 1, Sheet 19). These biomarkers are genes or gene fragments only present in one group of genomes and not others making them possible identifiers of the origin of collected isolates. The majority of searches have identified biomarkers with scores below a correlation cut-off of 0.95. However, in our study, E. faecalis from wastewater that clustered with agricultural and animal sources revealed a biomarker associated with CRISPR-associated genes that differentiated (score = 0.8043) these isolates from E. faecalis from wastewater that clustered with human sources. A comparison of E. faecium from clinical (inclusion) and wastewater (exclusion) sources yielded 7 biomarkers with scores greater than 0.80. These were associated with phage (n = 6) and hypothetical proteins (n = 1). A search for potential biomarkers that

Sequence statistics and Pan-genomic analysis
There was considerable variation in the size of the genomes and the number of contigs generated by sequencing each genome. The variation in the size of the genomes within a species could be a result of differences in the size of the chromosome and the presence/absence of plasmids. The variation in the number of contigs is likely due to the presence of repetitive and insertion genetic elements complicating assembly [54]. While the number of genomes used to generate the pan-genome in our study was small, the pan-genome of Enterococcus spp. is considered open as it is continually expanding and acquiring new accessory genome elements from other enterococci and bacterial species [80].

Multi-locus sequence typing
In E. faecium, CC-17 is associated with clinical infections and has been detected in treated and untreated wastewater, [13] suggesting that the majority of E. faecium sequenced from wastewater originated from humans. In E. faecalis, ST16 and ST40 have previously been associated with high level gentamicin resistance in clinical isolates and in isolates from pigs [24,59]. However, high level gentamicin resistance was not found in any E. faecalis with these sequence types. However, only 5 of the isolates in this study (4 E. faecalis and 1 E. faecium) exhibited high level gentamicin resistance. The association of these sequence types and gentamicin resistance may differ between studies because of geographical location, as gentamicin resistance is transferable, and because it may not be present in all ST16 and ST40 E. faecalis isolates.

Phylogeny
The genomes forming monophylogenetic groups support our previous results of speciation of enterococci based on the groESL locus [61,79]. The diversity of wastewater strains maybe a reflection of their origin from clinical, companion animal or agricultural sources. There was more genetic diversity in vancomycin-resistant E. faecalis than E. faecium. The distinct clustering between clinical and wastewater strains of E. faecium may be due to the large accessory genome and characterization of these genes may provide insight into the mechanisms whereby enterococci adapt to specific environments.
A disproportionate increase in genes associated with energy conversion, regulatory function, transport and secondary metabolism has been noted with expansion in genome size in previous comparative bacterial genomic studies [6,34,66]. So, an analysis of the COGs that are over represented in the expanded genomes of E. faecalis and E. faecium was completed to determine if some of these COGs could be increasing the fitness of multidrug resistant enterococci. This could ultimately increase the risk of infection with these strains and the transfer of virulence and AMR determinants to other bacteria.
In E. casseliflavus and E. gallinarum some COGs were over represented (i.e., carbohydrate transport and metabolism, transcription, cell motility, secondary metabolite biosynthesis, transport, catabolism and signal transduction mechanisms). These functional categories could allow for higher fitness in aquatic environments where more diverse substrates are typically available at much lower concentrations than in the digestive tract. The increase in cell motility related genes may also enable these species to undertake chemotaxis in aquatic environments where nutrients may be scarce [58]. Compared to E. faecalis and E. faecium, these genomes also contained more genes encoding for secondary metabolites including antimicrobial agents. Although these genes are not required for growth, they can confer competitiveness in diverse environments [31]. E. casseliflavus and E. gallinarum are known to be more environmentally fit than E. faecalis and E. faecium as a result of a variety of mechanisms. For instance, the yellow pigment of E. casseliflavus can protect this species from photoinactivation in aquatic environments [36]. However, E. faecium and E. faecalis are still the predominant species in wastewater, likely due to the continuous input of fecal waste into these systems.
The number of genes related to the mobilome increased with genome size in E. faecium and E. faecalis and this would suggest that the mobilome is a significant factor in the evolution of these bacteria within wastewater, contributing to genomic expansion and diversity. However, there was a lack of diversity in E. faecium isolates compared to E. faecalis, suggesting that E. faecium isolates may be more specifically adapted to clinical environments.

Antimicrobial resistance genes
Vancomycin-resistant enterococci have been known to exhibit resistance to a number of antimicrobials [32,74]. Enterococci are also intrinsically resistant to betalactams, aminoglycosides and streptogramins and can acquire antimicrobial resistance through horizontal gene transfer [32,42,74]. There are a variety of ARGs that confer vancomycin resistance, with vanA, vanB and vanC being the most common in wastewater enterococci. The most common determinant for teicoplanin resistance is vanZ, which can be integrated into the van operon, although it is absent in the vanB operon, and confers resistance to both vancomycin and teicoplanin [19]. As a result, teicoplanin resistance is commonly associated with vancomycin resistance. Although rarely, teicoplanin resistance without vancomycin resistance is likely due to changes in the promoter of the van operon or due to the presence of a different resistance mechanism [14,21,35].
Resistance to erythromycin and other macrolides can arise as a result of mutations in the 23S rRNA gene or by efflux pumps [42]. Macrolides are used extensively in both humans and animals. Blanch et al. [9] observed that most wastewater isolates with high-level vancomycin resistance were also resistant to erythromycin, suggesting that erythromycin resistance may favour the persistence of VRE in the environment. The modification of the 23S rRNA target by methylase genes, like ermB, can also confer resistance to streptogramins [42].
Enterococci exhibit intrinsic resistance to low concentrations of aminoglycosides as a result of the presence of aac(6′)-Ii. Gentamicin and streptomycin are clinicallyimportant as they are not inactivated by aac(6′)-Ii; and E. faecium are typically sensitive to these antimicrobials [42]. Aside from cross-resistance to other antimicrobial classes, like fluoroquinolones, resistance to these aminoglycosides is likely acquired. Others have shown that aminoglycoside resistance genes are frequently encoded on plasmids and transposons [42]. Streptomycin resistance either involves the inhibition of the drug at the ribosomal level or enzyme inactivation by an acquired streptomycin adenyltransferase [42].
There are multiple tetracycline resistance genes. Tet(L) encodes an efflux protein and tet(M) and tet(S) encode for ribosomal protection proteins. Disk susceptibility testing revealed that these isolates were resistant to doxycycline, whilst those containing tet(L) were susceptible, suggesting specificity for the tet(L) efflux protein.
In general, bacteria that are resistant to doxycycline are also resistant to tetracycline and oxytetracycline [26,56]. Tetracycline resistance can be due to efflux pumps or ribosomal protection mechanisms, which can be chromosomal and/or plasmid-borne. Co-selection of tetracycline and macrolide resistance in environmental enterococci may occur [39,40].

Virulence genes
The virulence genes detected have additional functions for improved environmental fitness. For instance, the majority of the virulence genes detected in the genomes from this study were also associated with biofilm formation or adherence to surfaces (i.e., ace, acm, agg, bop, ccf, cob, cpd, ebpABC, ecbA, efaA, esp, fsrABC, gelE, pil, scm, sgrA, sprE, and srt). These genes are ubiquitous as they likely play a role in the fitness of enterococci in both the human digestive tract and WWTPs. A number of capsule protein genes were also common among the genomes and not only confer resistance to phagocytosis in humans and animals [48,50], but also to predation by amoeba and bacteriophage in aquatic environments [51,73]. Hyaluronidase (hyl) genes have been associated with increased vancomycin resistance and virulence in mouse peritonitis models [50].

Mobile genetic elements
Mobile genetic elements (MGEs) play an important role in horizontal gene transfer and the spread of AMR among isolates in the environment, humans and animal hosts. MGEs include plasmids, transposable elements, prophages and various genomic islands such as integrative conjugative elements (ICE) [71]. The transposon Tn917 is widely distributed in enterococci [64]. All of these strains exhibited erythromycin resistance and erm(B) was found to be associated with Tn1545 and Tn917 [15]. Transposon Tn6098 was in all of the genomes and possessed genes associated with α-galactoside metabolism. Transposon Tn5385 was found in all of the E. faecalis with these isolates exhibiting erythromycin and doxycycline resistance as this transposon commonly carries these resistance genes [53]. Tn916-type transposons found in E. casseliflavus and E. gallinarum can carry genes coding for tetracycline, minocycline and erythromycin resistance [52,55]. While these transposons were detected in E. casseliflavus and E. gallinarum, they did not exhibit erythromycin resistance and no associated AMR genes were detected in their genomes.

CRISPR-Cas arrays and bacteriophage
Type II CRISPR-Cas systems are typically described in enterococci. Multiple CRISPR arrays can often be detected in bacterial genomes, but not all arrays are accompanied by Cas genes. The absence of CRISPR/Cas systems may compromise genome defence, increasing the likelihood of acquisition of AMR determinants from bacteriophage and plasmids [47]. When a phage infects a bacterium, it incorporates spacers into the array within the bacterial chromosome and occasionally plasmids. The spacers are expressed as CRISPR RNAs (crRNAs) and provide a surveillance mechanism for descendant cells and guide the CRISPR/Cas system to enable cleavage of the protospacer sequence in the phage genome. The cleaved phage genomes are then cannibalized and can no longer support productive phage infection [5,68]. CRISPR-Cas systems impact the evolution of both bacteria and phage populations. Transduction dependent horizontal gene transfer is a key driver of bacterial evolution and rapid viral evolution to evade CRISPR-Cas systems [68]. CRISPR/ Cas arrays can also provide a record of previous and continued interaction between particular bacteria and phage [5,65]. Spacers may limit the type of phage that can integrate into the genome, but bacteriophage can develop anti-CRISPR systems to promote their integration into the bacterial genome [11].
Phages found in the genomes were members of the Siphoviridae and Myoviridae. Other prophages in Enterococcus spp. belonging to Podoviridae, Inoviridae, Leviridae, Guttaviridae and Fuselloviridae have also been described [18,41]. Prophages from the Siphoviridae family were the most prevalent across all species and are also commonly identified in lactic acid bacteria [72].

Secondary metabolites
Bacteriocins are ribosomally synthesized antimicrobial peptides produced by Gram-positive and Gram-negative bacteria that have antimicrobial activity against closely related bacteria. They could provide a competitive advantage to the survival of bacteria in ecological niches that exhibit poor nutrient concentrations, heat and extreme pH [78]. Lantipeptides are also a growing class of bacteriocins with a large diversity of activity, structure, and biosynthetic machinery. Lantipeptides have multiple uses including as a limited class of antimicrobials [33]. Terpenes are most often associated with plants and fungi, and have been described in prokaryotes in only a few instances, including Enterococcus spp [7]. Terpenes can have a variety of functions including as antimicrobials, hormones, pigments, and flavor or odour constituents [45], but their role in Enterococcus spp. is unclear. Aryl polyene biosynthetic clusters produce a pigment that protects the organism from reactive oxygen species [62].

Biomarker search
Biomarkers are genes or gene fragments only present in one group of genomes and not others making them possible identifiers of the origin of collected isolates. For instance, Weigand et al. [77] conducted a search within watershed and enteric enterococcal genomes and found shared phenotype and phylogeny between the two groups, but also identified several biomarkers for both sources. These biomarkers encoded accessory nutrient utilization pathways, including a nickel uptake operon and sugar utilization pathways including xylose were overrepresented in enteric genomes [77]. Genes that serve as biomarker for E. casseliflavus and E. gallinarum include genes related to various types of nucleotide and carbohydrate metabolism, and genes with other functions which can improve environmental fitness, including a variety of transporters and housekeeping genes related to DNA replication, transcription and translation.

Conclusions
In this study, enterococci did not cluster phylogenetically based on point of isolation during wastewater treatment or on the type of WWTPs. Despite being the dominant species in wastewater, E. faecalis and E. faecium have smaller genomes and may be less equipped to survive outside of their target niche than E. casseliflavus and E. gallinarum. However, they do harbor more virulence, AMR, and mobile genetic elements than other Enterococcus spp. A larger genome size in E. faecalis and E. faecium was positively correlated with an expansion in the mobilome. In E. faecium, there was a positive correlation between the number of AMR genes and the mobilome. So, while the larger genome size of E. casseliflavus and E. gallinarum is accompanied by more genes related to metabolism and secondary functions, possibly increasing their fitness in the environment, this was not the case for E. faecium and E. faecalis. This study suggests that the key to understanding the impact of WWTPs on AMR dissemination is likely understanding the mobilome and discerning linkages between enterococci in wastewater and other environmental and clinical sources.

Isolate selection
Thirty-nine Enterococcus spp., including E. faecalis (n = 24), E. faecium (n = 11), E. casseliflavus (n = 2) and E. gallinarum (n = 2), isolated from wastewater were selected for whole genome sequencing. These were selected from a collection of 308 isolates from the primary and final effluents of two WWTPs in Kingston, Ontario, Canada, a BAF and a CAS system between 2014 and 2016. Isolates were speciated and subsequently underwent disc susceptibility testing for a panel of 12 antimicrobial agents. Nine to ten Enterococcus isolates were chosen from each of the primary and final effluent of the two WWTPs to represent the most prominent species isolated from the samples and the most prominent unique antimicrobial resistance phenotypic profiles. While all of these isolates grew in Todd-Hewitt broth supplemented with vancomycin (≥ 4 mg/L), not all met the requirements for vancomycin resistance using disc susceptibility testing following CLSI and EUCAST guidelines. This procedure used reference strains E. faecium ATCC 700221 (MIC ≥32 mg/L), E. faecalis ATCC 51299 (MIC ≥4 mg/L) and E. faecalis ATCC 29212 (susceptible) and Staphylococcus aureus ATCC 25923. The final isolates selected included 21 vancomycin-susceptible, multi-drug resistant enterococci and 18 enterococci with either intermediate resistance or resistance to vancomycin based on disc susceptibility testing. The AMR phenotypic profiles of the selected isolates are available in Table 2.

DNA extraction and sequencing
Enterococcus spp. were grown on Brain Heart Infusion (BHI) agar (Dalynn Biologicals, Calgary, AB) overnight at 37°C. Colonies from a freshly grown culture plate were suspended in TE buffer to achieve an OD 600 of 2 in order to harvest 2 × 10 9 cells, and 1 mL was transferred to a microcentrifuge tube and centrifuged for 2 min at 14000 x g. Genomic DNA was extracted using a modified DNeasy Blood & Tissue Kit (Qiagen, Hilden, Germany) with the addition of an enzymatic lysis step. Bacterial cells were incubated at 37°C with shaking (150 rpm) in lysis buffer consisting of 20 mM Tris-Cl (pH 8.0), 2 mM sodium EDTA, 1.2% Triton X-100 and 40 mg/mL lysozyme (Sigma Aldrich Canada, Oakville, ON). Proteinase K and 5 μL of 100 mg/mL RNase A were added (Qiagen, Hilden, Germany), and the mixture was incubated at room temperature for 10 min before proceeding to the next step. The quality of the genomic DNA was determined using a Nanodrop One UV-Vis Spectrophotometer (Thermo Scientific, Burlington, ON) and a Qubit fluorometer (Thermo Scientific). Genomic library construction was performed using the Illumina Nextera XT DNA sample preparation kit (Illumina Inc., San Diego, CA) following the manufacturer's instructions. The library was sequenced on an Illumina MiSeq platform (Illumina, Inc.). FASTA data was filtered for quality and high-quality reads were de novo assembled using SPAdes genome assembler 3.6.0 [4] and annotated using Prokka 1.12 ([63].

Comparative analysis
Pangenomic analysis was completed using the contigs extracted from the Genbank file which were reannotated using Prokka 1.13.3 (Seeman, 2014). This generated GFF files that were used as input to Roary 3.12 [46]. Multi-locus sequence typing (MLST) was performed using online MLST databases. In particular, the Enterococcus faecalis MLST (https://pubmlst.org/ efaecalis/) and Enterococcus faecium MLST (https:// pubmlst.org/ efaecium/) based at the University of Oxford [30] and funded by the Wellcome Trust. The phylogenetic trees were constructed based on analysis of single nucleotide variants (SNVs) of the core genes. The phylogenetic analyses were conducted using a single nucleotide variant phylogenomics (SNVPhyl) pipeline [49] using unassembled sequence read data. The paired-end reads for Illumina sequencing of the 39 Enterococcus spp. genomes were aligned to the appropriate reference genome to generate read pileups (SMALT v.0.7.5; http://www.sanger.ac.uk/science/tools/smalt-0). The presence and absence matrices were generated using Phandango [23]. Whole genome sequences of E. faecalis and E. faecium (Additional file 1) were also included in the analysis and were ran through the ART next-generation sequencing read simulator [27] to generate paired-end reads with length and coverage similar to the experimental dataset (2 × 300 base PE and~50X coverage). The reads were subject to mapping quality filtering (minimum mean mapping quality score of 30) and coverage (15X minimum coverage threshold) estimations. Using a single nucleotide variant (SNV) abundance ratio of 0.75, with no SNV density filtering setting, variant calling, variant consolidation and single nucleotide variant alignment generation of the final phylogeny was run through PhyML [22] using the maximum likelihood method. The resulting tree was visualized using interactive Tree of Life (iTOL) version 4.2.1 (https://itol.embl.de/). Assignment of proteins into clusters of orthologous groups (COGs) was performed using the compare genomes function of DOE Joint Genome Institute Integrated Microbial Genomes & Microbiomes platform [38]. Correlations were calculated using R statistical platform version 3.4.3 (R [16]) and figures were generated using packages Hmisc [25] and corrplot [76].
Draft genome sequences of the 39 Enterococcus spp. were investigated for the presence of putative virulence and AMR genes, mobile gene elements, bacteriophage, and CRISPR/Cas arrays. The contigs of each draft genome were ordered based on alignment against a reference genome using progressive Mauve [17]. Virulence and AMR genes were identified using Virulence Finder version 1.5 [29] and CARD version 2.0.1 [28], respectively. Results for AMR genes were further verified using megaBLAST and hits were manually curated. Genomes were investigated for integrative conjugative elements (ICEs) by homology searches using BLAST against 466 ICEs downloaded from the ICEberg database 1.0 [8]. The genomes were then analyzed for the presence of prophage using PHAST [81]. CRISPR-Cas arrays were identified using the CRISPRdb [20]. Secondary metabolite biosynthetic gene clusters were identified using the Antibiotics and Secondary Metabolite Analysis Shell (antiSMASH) version 3.0 [75].
A biomarker search was carried out with the 39 genomes from this study and an additional 59 genomes retrieved from NCBI using Neptune [37] and a Galaxy instance from the National Microbiology Laboratory in Winnipeg, MB, Canada. The inclusion and exclusion groups are listed in Additional file 1 (Sheet 19). The cut-off score for signatures among species was 95% and the cut-off score for signatures within species from different sources was 80%. The functions related to the genes covered by each signature was identified by mapping the signatures to a reference, then identifying the functions of the genes using UniProt [70].