Nanopore sequencing reveals genomic map of CTX-M-type extended-spectrum β-lactamases carried by Escherichia coli strains isolated from blue mussels (Mytilus edulis) in Norway

Environmental surveillance of antibiotic resistance can contribute towards better understanding and management of human and environmental health. This study applied a combination of long-read Oxford Nanopore MinION and short-read Illumina MiSeq-based sequencing to obtain closed complete genome sequences of two CTX-M-producing multidrug-resistant Escherichia coli strains isolated from blue mussels (Mytilus edulis) in Norway, in order to understand the potential for mobility of the detected antibiotic resistance genes (ARGs). The complete genome sequence of strain 631 (E. coli sequence type 38) was assembled into a circular chromosome of 5.19 Mb and five plasmids (between 98 kb and 5 kb). The majority of ARGs cluster in close proximity to each other on the chromosome within two separate multidrug-resistance determining regions (MDRs), each flanked by IS26 transposases. MDR-1 carries blaTEM-1, tmrB, aac(3)-IId, aadA5, mph(A), mrx, sul1, qacEΔ1 and dfrA17; while MDR-2 harbors aph(3″)-Ib, aph(6)-Id, blaTEM-1, catA1, tet(D) and sul2. Four identical chromosomal copies of blaCTX-M-14 are located outside these regions, flanked by ISEc9 transposases. Strain 1500 (E. coli sequence type 191) exhibited a circular chromosome of 4.73 Mb and two plasmids (91 kb and 4 kb). The 91 kb conjugative plasmid belonging to IncI1 group carries blaCTX-M-15 and blaTEM-1 genes. This study confirms the efficacy of combining Nanopore long-read and Illumina short-read sequencing for determining complete bacterial genome sequences, enabling detection and characterization of clinically important ARGs in the marine environment in Norway, with potential for further dissemination. It also highlights the need for environmental surveillance of antibiotic resistance in low prevalence settings like Norway.


Background
Extended-spectrum β-lactamase (ESBL)-producing Enterobacteriaceae represent an emerging public health threat, for which research and urgent development of new antibiotics is needed [1]. Extended-spectrum β-lactamases are a group of enzymes that hydrolyze β-lactam antibiotics, including 3 rd generation cephalosporins [2]. These enzymes are divided into molecular class A, C and D, based on the protein sequences [3]. Among ESBLs, plasmid-mediated class A β-lactamases belonging to the CTX-M-type are prominent ESBLs in the clinics, especially in Europe [4,5]. CTX-M-producing Escherichia coli are dominated by a few high-risk clones, such as sequence type (ST) 131 and ST38 [6,7]. E. coli ST131 and ST38 are recognized as enteroaggregative E. coli (EAEC) that can also cause extra-intestinal infections, including blood stream infection and urinary tract infection [8][9][10].
Environmental niches, including the aquatic environment, serve as a source of and/or a dissemination route for antibiotic resistance genes (ARGs) and resistant bacteria [11][12][13][14]. Clinically relevant ARGs and pathogens are introduced into the environment via different routes, such as through sewage contamination [15], waste from livestock production [16] and runoff from land [17]. Once introduced into the environment, ARGs and pathogens interact with environmental bacteria when sharing, at least temporarily, the same habitats [18]. Proximity and interactions within environmental niches provide opportunities for acquisition of resistance genes via horizontal transfer [18,19]. Moreover, environmental pollution with antibiotics and other antimicrobial substances lead to selection of ARGs and resistant bacteria [20,21]. Such environments, thus, may be hotspots for further dissemination of ARGs and resistant bacterial strains.
The southern and eastern countries in Europe present high-risk of antimicrobial resistance (AMR) due to, in part, extensive use of antibiotics [22,23]. For instance, the prevalence of invasive E. coli isolates resistant to 3 rd generation cephalosporins was 29.5% in Italy, in 2017 [22]. Accordingly, the prevalence of AMR in the environment was high [24], e.g., 15% of the E. coli strains (n = 141) isolated from Venus clams (Chamelea gallina) in Italy carried ESBLs [25]. In contrast, Norway represents a low prevalence setting, in terms of antibiotic use [23] and prevalence of AMR [22]. The prevalence of ESBL-positive E. coli in Norway was 6.6 and 3.0% from blood and urine, respectively, in 2017 [26]. Although there is limited knowledge, the overall prevalence of AMR in the environment in Norway is low. In a previous study, we detected only two ESBL-positive E. coli strains (out of 199 analyzed), isolated from blue mussels (Mytilus edulis) in Norway [27].
With the advent of next-generation sequencing, whole-genome sequencing is increasingly used for resolving questions of bacterial taxonomy as well as for studying the genetic contents of particular strains [28]. Short-read sequencing technologies, such as Illumina and Ion Torrent, allow fragmented genome assembly, i.e., draft genome and, occasionally, complete closed genome sequences [29,30]. Draft genome sequences are suitable for detecting genes present in a given strain and for basic characterization and phylogenetic studies [31]. However, draft genome sequences do not reveal the complete metabolic potential of the given strains. Longread sequencing technologies, such as Oxford Nanopore and PacBio, allow assembly of complete genome sequences [32,33], including the sequences of associated plasmids, which often carry metabolic genes and ARGs. However, owing to higher sequencing error rates associated with the long-read sequencing technologies, hybrid assembly using a combination of low-error short-reads as well as the long-reads, has been successfully applied to obtain reliable, complete closed genome sequences of bacterial strains [34].
The aim of this study was to apply a combination of long-read Nanopore and short-read Illumina-based sequencing to obtain high-quality complete genome sequences of the two ESBL-positive E. coli strains (631 and 1500) isolated from blue mussels (M. edulis) collected from coastal waters in Norway [27], in order to determine the genomic map of resistance genes and their potential for horizontal transfer.

Results
Complete genome sequences of the two CTX-Mproducing E. coli strains The Oxford Nanopore sequencing run generated 471,175 sequence reads for strain 631 and 576,474 sequence reads for strain 1500, with average read length of 7.7 kb and 6.7 kb, respectively. The longest read for strain 631 was 105,952 bp and for strain 1500 was 125, 266 bp. The average Phred quality score of the raw reads for Nanopore was 10.0 for both the strains (i.e., probability of error 0.1). The Nanopore-solo sequence assembly yielded six contigs for strain 631 and three contigs for strain 1500. The Illumina sequencing of strains 631 and 1500 generated 1,362,720 and 2,769,670 paired-end reads, respectively. After quality trimming, the average length of the reads was 227 bp for strain 631 and 211 bp for strain 1500. The longest read was 251 bp for both the strains. For Illumina reads, the average Phred quality scores of the trimmed reads were 34.5 for strain 631 and 34.9 for strain 1500 (i.e., probability of error < 0.001). The assembly of Illumina-solo sequences produced 102 and 50 contigs (> 500 bp) for strains 631 and 1500, respectively.
In order to obtain highly accurate closed complete genome sequences of strains 631 and 1500, hybrid de novo assembly of Nanopore long-reads and Illumina short-reads was performed for each strain. The complete genome of strain 631 (GenBank accession number: CP040263-CP040268) was assembled into six contigs; one contig representing a complete circular chromosome of 5,191,486 bp and five plasmids, ranging from 97,726 bp to 5165 bp ( Table 1). All ARGs, virulence genes (except for the espI gene detected on plasmid pEc631_1) and biocide/metal resistance genes (BMRGs) were located on the chromosome of this strain. Strain 1500 (GenBank accession number: CP040269-CP040271) exhibits a circular chromosome of 4,736,377 bp and two plasmids of 91,123 bp and 4087 bp ( Table 1). This strain carries all virulence genes and BMRGs on the chromosome. However, β-lactamase genes bla CTX-M-15 and bla TEM-1 are located on the plasmid pEc1500_CTX. Genome assembly statistics and complete overview of the genome sequences of strains 631 and 1500 are presented in Additional files 1 and 2, respectively. Additionally, a list of the virulence genes and BMRGs detected in strains 631 and 1500 (i.e., gene names and their function) are presented in Additional file 3. Conjugal transfer genes detected by searching through the GenBank files of the annotated genome sequences of strains 631 and 1500 are listed in Additional file 4.
A single nucleotide polymorphism (SNP)-based phylogenetic tree shows that E. coli strain 631 is clustering closer to human isolates, compared to ST38 isolates from other animals, suggesting a possible human origin of strain 631 (Fig. 2). The number of SNPs between strains 631 and other ST38 strains is presented in Additional file 5.

Discussion
To the best of our knowledge this is the first study reporting closed complete genome sequences of CTX-M-producing E. coli strains (631 and 1500) isolated from blue mussels (Mytilus edulis) in Norway. In accordance with previous studies, we used a combination of Nanopore and Illumina sequencing and hybrid de novo assembly combining Nanopore long-reads with the accuracy of Illumina reads, for obtaining closed complete genome sequences [37][38][39][40][41].
The multidrug-resistant E. coli strain 631 (ST38) was resistant to 15 antibiotics [27]. ST38 is a known pathogenic sequence type of E. coli, usually associated with intestinal disease and sometimes extra-intestinal infection [8]. Despite the number of plasmids harbored by this strain, all the ARGs are located on the chromosome clustered together at two separate MDRs, both flanked by IS26 transposases. MDR-1 contains two DNA fragments (17,687 bp and 3094 bp, respectively) that are identical (> 99.9% nucleotide identity) to segments of a conjugative IncFII plasmid pE2855-3 (92.7 kb) reported in E. coli (GenBank accession number: AP018799) (Fig. 1a). MDR-1 also has DNA segments that are identical (> 99.9% nucleotide identity) to segments of a plasmid, pVPS43 (19.4 kb), reported in Vibrio parahaemolyticus (GenBank accession number: KX957970). MDR-2 contains three DNA fragments (13,222 bp, 4188 bp and 1176 bp, respectively) that are identical (> 99.9% nucleotide identity) to segments of plasmid pKPN5 (88.6 kb), reported in Klebsiella pneumoniae (GenBank accession number: CP000650) (Fig. 1b). High identity of MDRs to the segments of plasmids carried by known pathogens, indicate that these regions are potentially mobile. Strain 631 carried four identical copies of the bla CTX-M-14 gene on the chromosome, flanked by  [43]. We detected two IncFII plasmids in strain 631, which did not carry ARGs. Even though this is quite unusual, IncFII plasmids without ARGs have been reported previously [44][45][46][47]. Further, our analysis showed that the MDR-1 on the chromosome of strain 631 has DNA segments that are identical (> 99.9% nucleotide identity) to DNA segments of a conjugative IncFII plasmid reported in E. coli (GenBank accession number: AP018799) (Fig. 1a). This suggests a likelihood that the MDR regions in strain 631 may have been transferred from IncFII plasmid onto the chromosome by transposition [48].
E. coli strain 1500 carries CTX-M-15 gene on a conjugative IncI1 plasmid (pEc1500_CTX) that has high sequence identity (> 99.9%) with plasmid pSH4469 (91.1 kb), detected in CTX-M-15-producing Shigella sonnei (GenBank accession number: KJ406378) isolated from an outbreak in the Republic of Korea [49]. Plasmid pEc1500_CTX also has high identity (> 99.9%) with CTX-M-carrying plasmid pEK204 (93.7 kb) from an E. coli strain (GenBank accession number: EU935740) reported in the UK [50]. The plasmid backbone also shares high identity (> 99.9%) to a segment of~61 kb from plasmid pHNRD174 (86.2 kb) from E. coli (GenBank accession number: KX246268) reported in China. Although CTX-M-14-encoding IncI1 plasmid has previously been reported in Norway [51], to the best of our knowledge, this is the first report on detection of E. coli carrying bla CTX-M-15 on an IncI1 plasmid in the marine environment in Norway. IncI1 plasmids are widely distributed within the family Enterobacteriaceae and are associated with dissemination of several ARGs [52]. The presence of CTX-M-15 gene on a conjugative IncI1 plasmid in strain

Conclusion
This study highlights the usefulness of hybrid assembly combining accurate short-reads and long-reads for obtaining closed complete genome sequences of strains 631 and 1500. Thus, enhancing the understanding of the genomic arrangement and potential for mobility of clinically important ARGs. It demonstrates the potential role of the marine environment in dissemination of pathogenic E. coli strains and clinically relevant ESBLs. These observations strengthen the notion that the environment plays an important role in dissemination of clinically relevant ARGs and pathogens [13]. Our study also highlights the need for surveillance of antibiotic resistance in the environment, especially in a low prevalence setting like Norway, which would provide important insights for designing mitigation strategies for coping with resistance dissemination, before it becomes widespread.

Methods
Bacterial strains, DNA extraction and sequencing E. coli strains 631 and 1500 were isolated from blue mussels (M. edulis) collected along the Norwegian coast, and characterized as described earlier [27]; the strains 631 and 1500 were denoted as strains B184 and B117, respectively, in Grevskott et al. 2017 [27].

Genome assembly and sequence analysis
The raw reads generated by Illumina MiSeq were quality trimmed and assembled, using Trimmomatic version 0.36 [54] and SPAdes version 3.11.1 [55], respectively. The quality of the generated Illumina reads was analyzed with FastQC version 0.11.3 [56] and CLC Genomics Workbench version 12.0.3 (Qiagen, Denmark). The raw data generated by the MinION instrument were processed and demultiplexed with Guppy software version 2.3.7 (Oxford Nanopore Technologies Ltd.) and assembled using Canu version 1.8 [57]. The quality of the demultiplexed data was analyzed with NanoPlot version 1.26.3 [58].
Subsequently, a hybrid de novo assembly of Illumina and Nanopore reads was performed, using Unicycler version 0.4.7 [34]. Assembly statistics were obtained, using QUAST server [59]. Average Nucleotide Identity values based on BLAST (ANIb) [60] were calculated, using the server JSpeciesWS [61], between E. coli strains 631, 1500 and E. coli DSM 30083 T (GenBank accession number: AGSE00000000), to confirm the species identity. Genomes were annotated, using the Prokaryotic Genome Annotation Pipeline (PGAP) version 4.8 at the National Center for Biotechnology Information (NCBI) [62]. Complete overview of the genome sequences of strains 631 and 1500 were obtained, using GView Server version 1.7 [63]. Genetic maps were produced, using SnapGene® software version 4.3.8.1 (GSL Biotech, USA). Multi-locus sequencing types (MLSTs) were examined, using the MLSTs tool described by Larsen et al. [64], with E. coli #1 MLST profile [65]. Plasmid replicons were typed, using PlasmidFinder 2.0 [66], as well as BLASTP analysis of the replication initiation (Rep) sequence against the NCBI database. The presence of ARGs was examined, using ResFinder 3.2 [67] and CARD 3.0.7 [68]. Virulence genes were analyzed, using the Virulence Factors Database (VFDB) [69], and BMRGs were examined, using the BacMet database 2.0 [70], using the script BacMet-Scan.pl against the database of "Experimentally confirmed resistance genes". Conjugal transfer genes were examined by searching through the GenBank files of the annotated genome sequences of strains 631 and 1500.

Comparative analysis of E. coli strain 631
A SNP-based comparative analysis of the E. coli strain 631 (ST38) with other strains of identical ST from different sources and countries was performed as described by Sabat et al. [71]. Briefly, the assembled genome sequences in FASTA format were analyzed, using the tool CSI Phylogeny 1.4 [72]. The parameters minimum depth at SNP positions, minimum relative depth at SNP positions, minimum distance between SNPs and minimum SNP quality were disabled, while the minimum read mapping quality and z-score were kept by default at 25 and 1.96, respectively. The SNP-based phylogenetic tree was displayed on-line with the Interactive Tree Of Life (iTOL) [73]. The details of the strains of E. coli ST38 included in the comparative analysis are presented in Additional file 6.