Recovery of phage and bacterial DNA released from swabs
In this procedure, material from a single swab is separated into a VLP fraction and ‘remainder’ fraction by centrifugation (Additional file 1: Figure S1). In brief, the VLP fraction is treated with DNase I to digest free DNA, VLPs are precipitated, and capsids disrupted by sodium dodecyl sulfate (SDS) and proteinase K. VLP nucleic acids are purified by exposure to cetyltrimethylammonium bromide (CTAB), phenol-chloroform extraction, and ethanol precipitation. For the remainder fraction and for unfractionated samples, cells are disrupted by chemical and physical lysis and DNA recovered using a commercial kit.
To better understand the recovery of phage and bacterial DNA from a skin or wound swab, we use mock samples of known concentration, composed of M13 phage and an F- strain of E. coli (i.e, non-host strain). Phage and cells were mixed together in a 19:1 ratio and diluted to approximate typical phage:cell ratios and concentrations from human and environmental samples [19]. In addition to DNA extraction itself, there are two major possible points of loss of material: (1) incomplete release of material from the swab, and (2) low removal of material from skin by swabbing. To first address point (1), we applied the mock sample directly to the swab, obtained the VLP and remainder fractions, and measured the yield of DNA recovered. Unfractionated samples were also analyzed as a control.
The amount of M13K07 phage and bacterial DNA extracted from each fraction (VLP and remainder) and from the unfractionated sample were determined by quantitative polymerase chain reaction (qPCR). Recoveries (r) are expressed as a fraction of the known total quantity of phage (rp) and bacteria (rb) in the entire mock sample (e.g., for an unfractionated sample, complete recovery of bacterial or phage DNA corresponds to rb = 1 and rp = 1, respectively). To a first approximation, we expect that, upon fractionation by non-equilibrium centrifugation, all bacteria are pelleted in the remainder fraction (expected rb = 1 in the remainder and rb = 0 in the VLP fraction), while phages remain evenly dispersed in solution (i.e., since the VLP and remainder fractions are equal volume, expected rp = 0.5 in both fractions). We also define yield of bacteria (yb) and phage (yp) as the ratio of the observed recovery to the expected recovery, expressed as a percentage.
Without fractionation, the recovery of bacterial DNA (rb = 0.45 ± 0.04 (yb = 45%)) and phage recovery (rp = 0.27 ± 0.04 (yp = 27%)), across sample loads, indicated that recovery for phage was somewhat less efficient than for bacterial DNA (Fig. 1). For fractionated samples, in the remainder fraction, rb = 0.56 ± 0.08 (yb = 56%) and rp = 0.11 ± 0.04 (yp = 22%) across sample loads, similar to yields from the unfractionated samples. In the VLP fraction, rb = 0.006 ± 0.003 (yb is undefined given an expectation of 0% recovery) while rp = 0.78 ± 0.09 (yp = 160%) across sample loads (Fig. 1). The apparent phage yield over 100% in the VLP fraction, corresponding to unexpected enrichment in the supernatant, may be due to inaccuracies in quantitation of the stock phage concentration (e.g., conversion factors do not account for compositional or structural irregularities of the phage). Such biases do not affect comparisons of yields between unfractionated samples and remainder and VLP fractions. In the VLP fraction, the ~ 85-fold decrease in cell DNA recovery and ~ 5-fold increase in phage DNA recovery, compared to unfractionated or the remainder fraction, indicates a substantial ~ 400-fold enrichment of DNA recovery (in terms of genome copies) from phages compared to cells. If sequenced, this enrichment would translate into a similar enrichment of phage DNA reads. The overall yields also indicate that DNA from roughly half or more of the phages and cells loaded onto a swab can be recovered in this protocol.
Amount of DNA released from swabs
Since this protocol is intended to produce samples for high-throughput sequencing, total recovered mass is an important metric. Depending on the manufacturer’s instructions, shotgun sequencing library preparation begins with 0.01–10 ng per sample. Total recovered DNA mass in the VLP fractions (phage + bacterial, as determined by qPCR) ranged from 0.63 ± 0.04 ng (from sample originally containing 1.9 × 108 virions) down to 1.2 ± 0.09 pg (from sample originally containing 1.9 × 105 virions) (Fig. 2a). The concentration of VLP fractions is sufficient for low-input metagenomic library preparation without amplification, with the exception of the 1.9 x 105 virion sample, which was expected to yield insufficient DNA. From the remainder fractions, the total genomic DNA (gDNA) mass ranged from 27 ± 3 ng (from sample originally containing 107 cells and 1.9 × 108 virions) down to 32 ± 5 pg (from sample originally containing 104 cells and 1.9 × 105 virions), which is adequate for low-input metagenomic library preparation and 16S rRNA sequencing (Fig. 2a). Thus, swabs containing samples in this concentration range yield sufficient DNA for bacterial analysis, and may yield sufficient DNA for phage analysis if the swab contains at least ~ 106 virions (depending on the phage).
The number of phage genome reads obtained from sequencing a VLP sample depends not only on the relative enrichment of VLP DNA compared to cell DNA, but also on the relative genome sizes. Since bacterial genomes are 10–1000 times larger than phage genomes, if no enrichment is performed, reads from bacterial genomes typically vastly outnumber reads from phage genomes. Indeed, without fractionation, M13K07 DNA represented 1.1 ± 0.5% of the total DNA by mass across sample loads (Fig. 2c), consistent with expectation for the initial sample (1.8%, based on E. coli genome size: 4.56 MB [20], M13 ssDNA genome size: 8669 nt, and 19:1 phage:bacteria ratio) and the somewhat lower recovery of phage compared to bacterial DNA.
In contrast, in VLP fractions, M13K07 DNA represented 73 ± 13% of the mass of recovered DNA across sample loads (Fig. 2c), corresponding to a 67-fold increase, on average, in the proportion of phage DNA out of total DNA, compared to the unfractionated samples. In a metagenomic sequencing sample, this would correspond to a similar increase in the fraction of reads from phage DNA. In terms of the apparent phage:cell ratio based on recovered DNA, which was approximately 12:1 in the unfractionated samples, fractionation enriched the VLP fraction to an apparent phage:cell ratio of ~ 2000:1 to ~ 12,000:1 (Fig. 2b).
Limit of detection of phage swabbed from human skin
Having validated the method using phage:cell mixtures placed directly onto swabs, we moved to determine recovery of DNA when including the second potential source of loss, swabbing from human skin. An M13KO7 phage stock was serially diluted ten-fold and samples were loaded onto human skin, then swabbed immediately or allowed to dry prior to swabbing. Swabs were processed analogously to the above experiments. Near quantitative yield was obtained, for samples in which ~ 105 or more virions were loaded onto the skin (Fig. 3). Lower sample loads than this could not be distinguished from qPCR background. Wet samples were observed to have consistently higher yields than dry samples; this phenomenon may be due to denaturation of phage upon drying and was also observed for T4, which showed a pronounced decrease in recovery for dried samples (see next section). The limit of detection of ~ 105 virions corresponds to ~ 400 fg of ssDNA.
Recovery of T4 and bacterial DNA from skin swabs
To test the compatibility of other phage morphologies with this method, analogous experiments were performed using the canonical Caudovirales phage T4, in place of M13K07, for a skin swabbing experiment. A ΔompC ΔompF strain of E. coli was selected for this experiment to avoid the confounding effect of phage adsorption and infection. T4 and E. coli were titered spectrophotometrically and mixed in a 10:1 ratio (108 virions: 107 cells), loaded onto skin, then swabbed immediately while wet. DNA recovery values were comparable to the M13 experiment. In the remainder fraction, phage recovery rp = 0.32 ± 0.03 (yp = 64%) and bacterial recovery rb = 0.26 ± 0.04 (yb = 26%) were similar to unfractionated samples (rp = 0.53 ± 0.12 and rb = 0.34 ± 0.08) (Fig. 4a,b). In the VLP fraction, phage recovery rp = 0.27 ± 0.03 (yp = 54%) and bacterial recovery rb = 0.004 ± 0.001 (yb is undefined) indicated enrichment of phage, as expected. Controls in which phage and cells were applied directly to the swab showed similar recoveries, consistent with expectation given near quantitative yield from swabs.
Total phage DNA mass recovered is substantially higher than for M13KO7, consistent with the larger genome of T4, with the amount of DNA recovered from the VLP fraction being 5.1 ± 0.6 ng, and DNA recovery from the remainder fraction being 18 ± 3 ng on average. These amounts are more than sufficient for typical next-generation sequencing (NGS) preparation protocols.
The recovered DNA from the VLP fraction was composed of 96 ± 1% T4 DNA by mass, a substantial increase compared to the unfractionated control (32 ± 2%) (Fig. 4c). This increase is less dramatic than for M13K07, due to the larger genome size of T4. Apparent phage:cell ratios after recovery also indicate significant viral enrichment, as fractionation resulted in a phage:cell ratio of ~ 700:1 in the VLP fraction, compared to that of unfractionated controls (~ 17:1) (Fig. 4d).
Swabbing was also performed from dried T4 samples, but these were found to produce very low yields in the VLP fraction compared to the analogous M13K07 experiment. However, the remainder fraction of the dried samples gave T4 DNA amounts comparable to swab controls, indicating that dried T4 could be recovered from the skin but was lost in the VLP purification process. We hypothesized that this was due to capsid damage that occurred during desiccation on the skin, which then exposed phage DNA to DNase digestion and thus reduced DNA purified in the VLP fraction. A plaque-forming assay was performed to determine the concentration of viable phage particles after desiccation; indeed, the VLP fraction from dried T4 produced ~ 100-fold fewer plaques than the VLP fraction of a wet T4 sample.
Recovery of bacterial and phage DNA from clinical wound and skin swabs
To test whether this processing method improved recovery of phage DNA from clinical swab samples, swabs were obtained from normal skin and wounds collected from patients at a wound clinic and processed in preparation for high throughput sequencing. We compared the novel sample preparation method (pilot study 2, or PS2) to a standard kit-based extraction method (pilot study 1, or PS1) by measuring dsDNA yields fluorometrically. Using PS1, only 10% of skin VLP fractions and 25% of wound VLP fractions yielded detectable DNA. However, PS2 gave significant improvement, producing detectable DNA in 30% of skin VLP samples and 100% of wound VLP samples (Fig. 5). Of VLP samples containing a detectable amount of DNA, PS2 yielded 6.7- and 4.4-fold greater average DNA concentration for skin and wound swabs, respectively, compared to PS1. Remainder fractions, which include substantial bacteria, are expected to contain more DNA, and as expected, nearly all skin and wound samples produced a detectable amount of DNA in the remainder fraction. In addition, PS2 gave 3.9- and 16.6-fold higher average DNA concentration compared to PS1, indicating that PS2 would also improve DNA yield for whole metagenome studies.
To assess the quality of the extracted DNA, samples from both studies were sequenced by paired-end Illumina MiSeq. Bacterial composition of the remainder fractions was determined by 16S rRNA sequencing using the V1-V3 loops (Fig. 6a). Both skin and wound samples from PS1 were largely dominated by Burkholderiaceae, a well known kit contaminant [22, 23]. However, PS1 wound samples also contained low levels of previously reported skin colonizers such as Corynebacteriaceae, Staphylococcaceae, and Pseudomonadaceae [24]. In contrast, PS2 skin and wound samples did not suffer from the same apparent kit contamination as PS1, and PS2 samples appear to contain archetypal skin and wound microbiomes. On average, the most abundant PS2 skin community members were commensals and opportunists, including Corynebacteriaceae, Staphylococcaceae, Proprionibacteriaceae, and Micrococcaceae [24, 25]. PS2 wound samples had high levels of Staphylococcaceae and Enterobacteriaceae, as well as lower levels of other previously reported wound colonizers like Bacteroidaceae, Campylobacteriaceae, Clostridiales, Porphyrmonadaceae, Pseudomonadaceae, and Streptococcaceae [26,27,28]. These findings confirm that the novel fractionation and extraction protocol produces high quality DNA sufficient for sequencing, resulting in improved community recapitulation compared to the kit-based extraction used here.
VLP-enriched samples were shotgun sequenced, and the quantity of recovered viral DNA was estimated by mapping quality-controlled reads to the Joint Genome Institute’s integrated microbial genomes viral analysis (IMG/VR) metagenomic database (Fig. 6b). On average, only 1.1 ± 1.2% of PS1 skin reads and 2.2 ± 4.3% of PS1 wound reads mapped to the database. PS2 samples had significantly higher viral mapping rates, with averages of 15.2 ± 8.9% for skin samples and 7.5 ± 13.2% for wound samples, which corresponded to higher absolute number of known viral reads (Additional file 1: Figure S2A). However, PS2 samples also had higher levels of human DNA contamination (Additional file 1: Figure S2B). Although the IMG/VR database is likely largely incomplete, these results show that the novel method produces more known viral reads on average.