Field collection of tomato plant parts
Tomato plant parts and fruit (cultivar BHN 602) were collected from research fields at the Virginia Tech Agriculture Research and Education Center in Painter, Virginia (Latitude 37.58, Longitude −75.78). This cultivar shares resistance to specific fungal, bacterial, nematode and viral pressures with other BHN varieties (Additional file 1: Table S1), which accounts for the popularity of BHN tomatoes among commercial growers throughout the eastern United States. Seedlings were started in the green house on 4/29/11 and moved to the field on 6/3/2011. Plants were irrigated using drip tape buried one inch beneath soil level on beds covered with polyethylene mulch. The plots were irrigated daily according to watering needs. Insect, weed control and fertilization was accomplished following the recommendations of the Virginia Cooperative Extension. On July 20th, 2011, four individual plants were taken from four alternating rows, across approximately 30 sq meters of tomato field. At harvest, fruits were mature - predominantly green and breakers (commercial tomatoes in this region are harvested when green). Wearing gloves and using clippers, researchers collected approximately 4 to 6 leaves from both the top third or bottom third of each selected plant; these materials were placed in ziplock bags and considered “Top” and “Bottom” leaf samples respectively. Stems were cut at branching points (6 to 10 per replicate) and six to ten flower cymes were collected per replicate. Fruits (4 per replicate) were taken from various locations on the plants. Roots were unearthed, shaken vigorously, and then cut from the main stem and placed in ziplock bags. All samples were transported back to the lab at ambient temperature and refrigerated at 4 degrees Celsius for 24 hours prior to DNA extraction.
Nucleic acid extraction
Three hundred milliliters of sterile distilled water were added to each ziplocked bag of plant parts and samples, which was sonicated for 6 minutes to disrupt cells and knock organisms from biofilms or other protective habitat associated with plant organs. This wash was centrifuged and DNA was extracted from the resulting pellet using the Promega Wizard® Genomic DNA purification Kit (Cat.# A1120) (Promega Corporation, Madison, WI) following the extraction protocol for Gram-positive bacterial species.
16S rRNA gene amplicon preparation
PCR products designed to target the V2 region of 16S rRNA genes were amplified for Roche pyrosequencing (454) using Roche Fusion Primer A, key (TCAG), and MIDs (Multiplex identifiers for 24 individual samples) and the 27F universal primer: 5’ CGT ATC GCC TCC CTC GCG CCATCAGAGA GTT TGA TCC TGG CTC AG 3’ Reverse primer 533R was used with Roche Fusion Primer B, key, and no mids: 5’ CTA TGC GCC TTG CCA GCC CGC TCAG CGA GAG ATA C TTA CCG CGG CTG CTG GCA C 3’ PCR fragments were cleaned (fragments under 300 bases were removed) using AMPure XP from Beckman Coulter Genomics (Danvers, Massachusetts) at a ratio of 60 μl of AMPure beads to 100 μl PCR product. Remaining PCR fragments were run on the Agilent Bioanalyzer 2100, using the High Sensitivity lab-on-a-chip Reagents (Agilent Technologies, Inc., Santa Clara, CA) to ensure that smaller fragments had been removed prior to emulsion PCR preparation.
18S rRNA gene amplicon preparation
EF4 5’GGAAGGGRTGTATTTATTAG 3’ and Fung5 5’GTAAAAGTCCTGGT TCCCC 3’ [10] with 24 MIDs and Roche Fusion Primer adaptors A and B. PCR fragments were cleaned (removal of fragments under 300 bases) using AMPure XP at a ratio of 60 μl of AMPure beads to 100 μl PCR product. Resulting PCR fragments were run on the Bioanalyzer 2100 using to ensure that smaller fragments had been removed prior to emulsion PCR preparation.
Metagenome preparation
Four independent replicates from each plant organ were pooled to create one representative metagenome for each of the 6 regions: Top Leaves, Flowers, Fruits, Stems, Bottom Leaves, and Roots. DNA was sheared using the Covaris S2 (Woburn, Massachusetts) set for 200 cycles per burst, Duty cycle= 5%, Intensity= 3, for a total of 80 seconds.
Emulsion PCR
To allow optimal amplification in emulsion, 16S and 18S rRNA gene amplicons were diluted to estimate .3 copies of DNA per bead. Sheared whole genome shotgun (WGS) DNA for metagenomes was diluted to estimate between 3 and 9 copies per bead. Emulsion PCR and breaking and enriching was performed using the Lib-A MV kit for FLX Titanium pyrosequencing from Roche Diagnostics Corp. (Indianapolis, IN) according to the manufacturer’s specifications. For metagenomes, the Lib – L Rapid Library Kit for FLX Titanium pyrosequencing was used according to the manufacturer’s specifications.
Pyrosequencing
Roche 454 Titanium FLX Approximately 790,000 DNA-enriched beads were loaded into each of 7 quarter regions of two GS Titanium FLX pico titer plates (two separate runs) for sequencing of amplicons and WGS DNA on the Roche 454 GS Titanium FLX platform according to the manufacturer’s specifications.
Sequence pre-processing
Sequences were processed and split by multiplex identifiers (MIDs) using the sff tools from Roche 454 of Roche Diagnostics Corp. (Indianapolis, IN). Fusion primer sequences detected on the 5’ and 3’ end of sequences were trimmed.
Bioinformatic analyses: 16S rRNA gene analyses
The Data Intensive Academic Grid (DIAG) computational cloud (http://diagcomputing.org) was used in combination with the CloVR-16S automated pipeline (Version1.1) [11] to perform computationally-intensive tasks, such as chimera detection and nonparametric statistical analyses, on the 16S rRNA gene sequences. The CloVR-16S pipeline utilizes tools for phylogenetic analysis of 16S rRNA data from Qiime [12] and Mothur [13] for sequence processing and diversity analysis, the RDP Bayesian classifier [14] for taxonomic assignment, UCHIME [15] for chimera detection and removal, Metastats [7] for statistical comparisons of sample groups, and various R programs for visualization and unsupervised clustering. A full description of the CloVR-16S standard operating procedure (SOP) is available online at http://clovr.org.
Phylogenetic analyses of putative Salmonella 16S rRNA gene sequences
We used the approximately-maximum-likelihood method for phylogenetic inference implemented in FastTree [16] to further explore the taxonomic identity of Enterobacteriaceae sequences from the different regions of tomato plants. Reference sequences from Enterobacteriaceae and other phyla observed in the samples were used with Salmonella reference sequences from NCBI (Additional file 2: Table S2). Inference was performed using the default settings. Clustering of individuals using the program STRUCTURE [17, 18] was performed with K = 2, and K = 3.
Bioinformatic analyses: 18S rRNA gene analysis
Sequences were clustered stringently using the Qiime UCLUST module set for a 99% identity threshold. Representatives of each cluster (i.e., the longest read in each cluster) were examined for chimeras using UCHIME [15] in de novo mode. Clusters identified as chimeras were removed from further analysis. Remaining representatives were searched against the SILVA rRNA small subunit (SSU) [19] database (limited to reference sequences with full taxonomic identification) with BLASTN and a minimum e-value threshold of 1e-5. To provide information about overall fungal distribution, the closest known neighbor for each 99% identity cluster was assigned to the taxonomy of the best-BLAST-hit to the representative sequence.
Metagenomic analyses
Whole genome shotgun (WGS) metagenomic sequences were provided as input to the CloVR-Metagenomics pipeline (version 1.0) using the “no - Open Read Frameorfs” (no-ORFs) option and the MgRast metagenomics analysis server (version 3.2 Argonne National Laboratory. Argonne, IL http://metagenomics.anl.gov) [20]. Different maximum e-value cutoffs, minimum percentage identity cutoffs and minimum alignment length cutoffs were used for different questions (see individual list in Results section). For overall phylogenetic designation at phylum level – default parameters were 80% similarity over 100 bases at 1e-5. CloVR-Metagenomics was used with a BLAST-based protocol to perform taxonomic and functional annotations as well as statistical analysis with Metastats and R. CloVR pipeline for metagenomes was used with the following SOPs:
1) UCLUST first clusters redundant sequences that show 99% nucleotide identity and removes artificial 454 replicate reads. 2) Representative DNA sequences are searched against the NCBI COG database using BLASTX. 3) Representative DNA sequences are searched against the NCBI RefSeq database of finished prokaryotic genomes using BLASTN. 4) Metastats and CloVR-implemented R scripts are applied for additional statistical and graphical evaluations of the pipeline results. Functional annotation was examined using the COGs database [21]. A full description of the CloVR-Metagenomics SOP is available online at http://clovr.org.
Salmonelladetection pipeline
In order to create a pipeline for detecting the presence of Salmonella, the IMG contig and genes databases were split into two databases: one that represented all Salmonella contigs and genes present in the IMG and the second that represented the remainder of the database (minus all Salmonella). A BLAST approach with extremely relaxed parameters was used to gather hits to Salmonella from both of the databases. A bit score with at least 50% the size of the average length of each shotgun data set and a variable id percentage (in this case 40, 50,..100) was used to create plots of hits to Salmonella and the bit score of these hits.
Data Deposition
All metagenomes are available in Mg Rast; accession numbers; 4488526.3 (Bottom Leaves), 4488531.3 (Stems), 4488530.3 (leaves), 4488529.3 (Tomato Fruits), 4488528.3 (Roots), 4488527.3 (Flowers) and SRA at NCBI Genbank (SRA Accession number SRA061333). Submissions conform to the “Minimum Information Standards” [22] recommended by the Genomic Standards Consortium.