- Research article
- Open Access
Bacterial community diversity and variation in spray water sources and the tomato fruit surface
BMC Microbiology volume 11, Article number: 81 (2011)
Tomato (Solanum lycopersicum) consumption has been one of the most common causes of produce-associated salmonellosis in the United States. Contamination may originate from animal waste, insects, soil or water. Current guidelines for fresh tomato production recommend the use of potable water for applications coming in direct contact with the fruit, but due to high demand, water from other sources is frequently used. We sought to describe the overall bacterial diversity on the surface of tomato fruit and the effect of two different water sources (ground and surface water) when used for direct crop applications by generating a 454-pyrosequencing 16S rRNA dataset of these different environments. This study represents the first in depth characterization of bacterial communities in the tomato fruit surface and the water sources commonly used in commercial vegetable production.
The two water sources tested had a significantly different bacterial composition. Proteobacteria was predominant in groundwater samples, whereas in the significantly more diverse surface water, abundant phyla also included Firmicutes, Actinobacteria and Verrucomicrobia. The fruit surface bacterial communities on tomatoes sprayed with both water sources could not be differentiated using various statistical methods. Both fruit surface environments had a high representation of Gammaproteobacteria, and within this class the genera Pantoea and Enterobacter were the most abundant.
Despite the major differences observed in the bacterial composition of ground and surface water, the season long use of these very different water sources did not have a significant impact on the bacterial composition of the tomato fruit surface. This study has provided the first next-generation sequencing database describing the bacterial communities living in the fruit surface of a tomato crop under two different spray water regimes, and therefore represents an important step forward towards the development of science-based metrics for Good Agricultural Practices.
An increasing number of epidemic outbreaks caused by contamination of produce by human pathogens have been observed in the United States . Between 1996 and 2008, a total of 82 produce related outbreaks were reported. Bacterial species comprise the majority of reported disease causing agents, with pathogenic Salmonella and E. coli strains implicated most frequently. Lettuce and tomatoes were the commodities associated with the most outbreaks, followed by cantaloupe and berries . In recent years, tomatoes have been one of the main products responsible for produce-associated salmonellosis .
The phyllosphere has found itself at an intersection of food safety concerns and research that examines the microbial ecology of agricultural environments [4–6]. Human pathogens find their way to this environment via diverse channels that remain poorly understood. Human, animal, atmospheric, abiotic and xenobiotic conduits have all been examined for their potential to contribute to the precise factors needed to support growth or simple persistence of human pathogens of bacterial origin in agricultural commodities [7, 8]. An extremely important component of agricultural management that remains to be comprehensively examined with culture-independent methods is the microbial ecology associated with water sources used in irrigation and pesticide applications.
In the United States, the tomato industry's Good Agricultural Practices guidelines, which are focused on improving the food safety of the product, recommend the use of potable water for applications that come in direct contact with the crop . Given that large volumes of water are needed for pesticide applications and overhead irrigation of vegetable crops, water demand cannot always be met with the available potable water. Consequently growers routinely use water from other sources, such as farm ponds. Surface water is highly susceptible to contamination due to direct discharge of sewage and the impact of runoff. In the mid-Atlantic region of the United States growers report routine visits to their farm ponds by Canada geese, a potential avian reservoir of Salmonella  and white-tailed deer, a potential reservoir for E. coli O157:H7 . This region is home to a large poultry industry, which also represents a potential source of Salmonella contamination. Groundwater sources, on the other hand, are less likely to support enteric pathogens because of the natural filtering mechanisms of soils, although poorly managed wells are susceptible to contamination .
The type of irrigation system can influence the risk of crop contamination: overhead irrigation, for instance, is more likely to produce virus contamination than are furrow and drip irrigation . Studies conducted in California found no significant differences in coliform counts among crops spray-irrigated with two types of treated wastewater or with well water. This was found despite the fact that the treated waters used in this study showed higher levels of total and fecal coliforms than the well water . The overall impact of using surface water for direct crop applications on fruit surface bacterial communities has not been reported to date.
Denaturing gradient gel electrophoresis studies have indicated that variables such as plant species and stage of development can affect the composition of phyllosphere microbial communities. In addition, it was found that these communities are far more complex than culture-based methods used in the past had indicated [6, 15, 16]. Recent studies described the bacterial diversity of phyllosphere samples from natural and agricultural ecosystems using traditional cloning and sequencing approaches, leading to the identification of many previously undescribed members of these communities. These studies also indicated that phyllosphere communities can be altered by the application of diverse agricultural materials [16–18].
More recently next-generation sequencing technologies, including 454-pyrosequencing, have provided more comprehensive descriptions of bacterial communities in different environments due to the increased number of sequence reads obtained [19–26]. A study of bacterial diversity on tree leaves using 454 sequencing indicated that tree and bacterial community phylogeny are associated, and that the geographic differentiation of bacterial communities on a single tree species is minimal . To our knowledge, no such studies have been conducted to date to describe the impact of water quality on bacterial populations in the phyllosphere of specialty crops.
We utilized 454-pyrosequencing to generate 34,016 16S rRNA gene sequences from 16 field samples: 10 tomato fruit samples that had been sprayed with either surface water (ps), or groundwater (pg), three samples of surface water (ws), and three samples of groundwater (wg). Using these data, we sought to 1) compare the bacterial profile of ground and surface water that was used for pesticide applications and 2) assess the impact of water quality on the fruit surface bacterial profile of a tomato crop. A smaller preliminary dataset of 2008 fruit surface samples generated through Sanger sequencing is also included for comparison. Despite the significant differences between bacterial communities in surface and groundwater, the surface communities on the tomato fruits treated with these water sources could not be differentiated by a variety of statistical methods.
Taxonomic distributions among samples
After screening our data for poor sequences and contaminants (see Methods), we recovered 27,757 high-quality 16S rRNA gene sequences with an average of 1,734 ± 471 (SD) sequences per sample (results refer to the 2009 data unless otherwise stated).
We taxonomically classified all sequences (from phylum to genus) using the RDP Bayesian classifier with a confidence threshold of 80%. Examining the phylum level distributions across samples, we found that nearly all fruit surface samples appeared to have very similar 16S rRNA profiles. In these, Proteobacteria dominated the observed sequences, with smaller representations of Firmicutes and Actinobacteria. One surface water treated sample (ps4) was dominated by Firmicutes sequences, most likely as a result of contamination with internal fruit material. While the wg samples displayed similar 16S rRNA profiles dominated by Proteobacteria, the ws samples had a more even representation among four dominant phyla. In addition, ws samples contained a large number of sequences that could not be classified even at the phylum level (Figure 1).
To compare environments for differentially-abundant taxonomic groups, we ran the Metastats methodology  on phylum, class, and genus level assignments. However, a limitation of the Metastats approach for q-value (individual false discovery rate) estimation is poor accuracy for datasets with < 100 features. To compensate, we compute the overall false discovery rate (FDR) for taxonomic groups we have called significant in our analysis using the method by Benjamini and Hochberg .
Results of Metastats runs comparing bacterial classes among populations and accounting for intra-replicate variability indicated that five taxonomic classes are differentially abundant in the two water sources (P < 0.015), most notably Betaproteobacteria, which makes up approximately 86% of sequences on average in the wg samples, but only close to 9% of sequences in the ws samples (Additional file 1). Of the five taxonomic classes we call as differentially abundant between wg and ws samples, the FDR ~0.12, so we expect less than one false positive among these five. The most abundant classes in ws profiles were Alphaproteobacteria, Actinobacteria and the unclassified group.
Betaproteobacteria was also the most differentially abundant class when pg and wg were compared (10 vs. 86%), among nine differentially abundant bacterial classes (FDR ~0.07). Fourteen bacterial classes were differentially abundant between ws and ps (FDR ~0.06), most notably Clostridia, which was enriched for in ws. Both fruit surface environments were enriched for Gammaproteobacteria. Despite the differences observed between water sources, no significant differences were found between the two fruit surface environments (this includes an attempt in which the ps4 outlier was removed).
At the genus level, significant differences were found between water sources, with 30 genera showing differential abundance (P < 0.05). Table 1 lists the bacterial genera among these representing 1% or more of the sequences in either of the water sources analyzed. Fruit surface environments were highly variable and no significant differences were detected for the high abundance genera, which included Pantoea, Enterobacter, Sphingomonas, Leuconostoc, Pseudomonas and Burkholderia (Additional file 2). The less abundant genera Paenibacillus, Stenotrophomonas, Bacillus and Lactococcus were more abundant in pg, while Frigoribacterium, Herbaspirillum, Rickettsia, Wautersiella and Cloacibacterium were more abundant in ps. None of these genera represented more than 0.2% of the population.
A statistical comparison of the 2008 and 2009 fruit surface samples (not considering variability between 2009 replicates) indicated that in both the 454 and Sanger data, Bacilli is enriched in the ps samples, and Gammaproteobacteria is enriched in pg (Figure 2A). At the genus level, Pantoea showed high abundance in both years (Figure 2B). Enterobacter, Pseudomonas, Sphingomonas and Burkholderia were more predominant in the 2009 samples, while a larger proportion of the 2008 sequences remained unclassified. These results indicate that we were able to detect similar bacterial populations on the tomato fruit surface in both years, despite the methodological differences, the differences between growing seasons and the fact that different tomato cultivars were sampled.
Diversity analysis using operational taxonomic units
To compute estimates of species-level diversity and perform comparisons between environments, all sequences were clustered into operational taxonomic units (OTUs) using Mothur  and a similarity threshold of 95% (see Methods). The total number of unique OTUs within each environment was 494 (pg), 399 (ps), 228 (wg) and 1342 (ws). After computing rarefaction curves for each sample (Figure 3A), we immediately observed that the surface water samples were significantly more diverse than the others, and that groundwater and fruit surface samples are indistinguishable in terms of diversity. Additionally, the Shannon diversity index and Chao1 estimator were calculated for each sample, and again we see that the ws samples are the most diverse at the OTU level (Figure 3B).
To assess the diversity captured with the samples, we calculated the Good's Coverage Estimator on the OTUs from each sample using Mothur. Results indicated that we captured between 93 and 98% of the species in all of the samples except for ws samples, where we only identified between 70 and 73% of the species.
We then examined shared OTUs between individual replicates and treatments. Fruit surface environments shared approximately half their OTUs, and these represented more than 90% of the sequences in both samples. In contrast, water environments shared only 31 OTUs, which represented 2% of the OTUs present in surface water and 14% of those in groundwater. These shared OTUs corresponded to 62% of the sequences in groundwater, but only 6% of the sequences in surface water. These results again point to the greater differences between water-based microbial communities as compared to those in the treated tomato fruit surfaces.
A hierarchical clustering of all samples was performed using the Jaccard index based on shared OTU composition (Figure 4). This tree indicated that the two fruit surface communities are not uniquely distinguishable at the OTU level despite the microbial differences in water sources. However, water samples did cluster with their associated environments.
To test the sensitivity of the above results to any particular methodology, we re-ran our analysis using the new automated 16S rRNA pipelines provided by the CloVR software package (http://clovr.org). CloVR is a virtual machine designed to run large-scale genomic analyses in a cloud-based environment such as Amazon EC2. The CloVR-16S track runs Mothur  and Qiime-based  standard operating protocols in parallel complete with alpha and beta diversity analysis of multiple samples.
After running our high-quality sequence dataset through the CloVR-16S pipeline, we saw remarkable consistency with our initial results. All OTU analyses confirm the enriched diversity of surface water samples as compared to all others, as well as a lack of differentially abundant taxonomic groups between pg and ps samples.
Using various unsupervised approaches, water samples consistently clustered with their unique environments at all taxonomic levels (Figure 5). There was persistent difficulty distinguishing between fruit surface samples treated with surface or groundwater. Even the UniFrac metric, which arguably maintains the highest phylogenetic resolution of any method, was unable to resolve this issue (Figure 6). The concordance among our methodology and the CloVR-16S methods suggests that our results are not sensitive to modifications in the analysis protocol.
Screening for Enterobacteriaceae pathogens
Less than 1% of the hits in the water samples were to the family Enterobacteriaceae (Table 2). In fruit surface samples 33 to 79% of the sequences were identified as Enterobacteriaceae, with higher counts in pg than in ps in 2008 and again in 2009. Among the Enterobacteriaceae genera, Pantoea was the most abundant in both years. Enterobacter also showed high abundance, but only in the 2009 samples.
We created a phylogenetic tree in order to compare the Enterobacteriaceae species present in the different samples (Figure 7). By populating the tree with several genera we could not confidently assign sequences to pathogenic species within the family. Based on our tree, the 527 bp segment of the 16S rRNA gene used is not enough to distinguish between several members of the Enterobacteriaceae family.
This study provides the first next-generation sequencing survey of the bacterial community in the tomato fruit surface. As such it has confirmed the presence of taxa previously found to inhabit the phyllosphere of this crop species, as well as identified many others not yet encountered in this environment. The three most abundant bacterial classes in the tomato fruit surface environments compared in this study were Gamma, Alpha and Betaproteobacteria. These were also found in higher abundance in the phyllosphere of other plant species, although the relative abundances for these classes vary [16–18, 27]. Genera here found in high abundance in the tomato fruit surface, such as Pantoea and Enterobacter, are also abundant in the phyllosphere of certain Atlantic rainforest tree species and cottonwood, indicating a wide distribution across different plant species [16, 18]. Bacterial genera found in our 2009 fruit surface samples were also identified among the culturable bacteria on leaves of field-grown tomatoes, including Pseudomonas, Pantoea, Sphingomonas, Massilia, Xhantomonas and Curtobacterium . Two additional genera, Burkholderia and Leuconostoc, showed high abundance in our study. Burkholderia was the most abundant genus in our groundwater samples, representing 75% of the sequences, and might have been introduced in the environment through groundwater applications. Leuconostoc has been previously described as the predominant lactic acid bacteria on tomato fruit surfaces .
Similar bacterial classes and genera were found in high abundance in samples collected in 2008 and 2009, with the largest differences corresponding to the unclassified sequences. Several different reasons could account for this variation, including differences in DNA extraction, sequencing sample preparation and primers used in both years, as well as potential growing season effects.
Of special interest is the high proportion of sequences identified as Enterobacteriaceae, given that this family includes important human pathogenic bacteria like Salmonella and E. coli. Similar representation of this family was obtained in the phyllosphere of Trichilia spp. and Pinus ponderosa, but not in that of Campomanesia xanthocarpa [16, 27]. The high adaptability of this family to the tomato fruit surface environment might be associated to the higher risk of disease outbreaks associated with this crop.
Differences between fruit surface environments do not appear to be linked to the water applications, indicating that plant conditions allow for only some of the bacterial groups present in water to establish themselves. Similar results were obtained when the fruit surface communities living on apple trees under conventional and organic management were compared, where only low abundance groups differed between the two environments . Similarly, no effect on the levels of fecal and total coliforms was observed when reclaimed water with higher coliform counts, and well water were sprayed on six horticultural crops .
Several factors determine whether the microorganisms arriving on the leaf surface can become established, including leaf characteristics, environmental factors and properties of the microorganisms themselves . Pesticides are known to differentially impact bacterial survival and growth. In a study conducted to determine the effect of pesticides on bacterial survival, Salmonella spp. were best able to survive and Listeria spp. were least able to survive in pesticide solutions, among all the bacteria tested. Bravo, the fungicide applied closest to the sampling date in this study, has been found to reduce bacterial growth, although it was less inhibitory than other products tested . The addition of pesticides to the different water sources used in this study might have reduced bacterial community differentiation in the two resulting fruit environments. The smooth texture of tomato skin may also prevent attachment and result in bacteria being washed away by rain or spray water.
Although our results point to the lack of major effects of the two water sources used for pesticide applications, confirming this at the species level for human enteric pathogens such as Salmonella, would be crucial for establishing the potential safety of surface water use for contact applications. In addition, our sampling depth analysis suggests that deeper sampling is needed for all the environments, but especially for the more diverse ws, to capture at least 90% of the community members
Recent studies of analysis methodologies in bacterial diversity and metagenomics projects have revealed that small modifications or substitution of similar tools may potentially result in significant changes in the overall biological conclusions [35–37]. In the rapidly evolving field of genomics, there are few concrete standards, and the sophisticated computational protocols being developed certainly will always be sensitive to some uncertainty in the analysis parameters. To examine the sensitivity of our results to the methodology employed, we re-ran our analysis using two parallel 16S rRNA protocols from the CloVR package and found large agreement with our major results. Additionally, the 454 platform itself has ongoing issues regarding artificial replicate generation  and homopolymer identification errors , both of which contribute to overestimation of species-level diversity in 16S rRNA-based studies. Though it is likely that our estimates of absolute species-level diversity are indeed inflated, the consistency in relative diversity differences between samples across multiple analyses is encouraging and lends support to the validity of our initial computational results and final biological and ecological conclusions.
Our research has generated the first culture-independent next-generation sequencing data set for the bacterial microbiology associated with the phyllosphere of a tomato crop under agricultural management. There are a myriad of agricultural practices that may play a role in the contamination of tomatoes by human pathogenic bacteria. This work has provided valuable evidence suggesting that water used for pesticide applications does not represent a major modifier of the fruit surface bacterial communities composition.
As previously reported for other plant species, Gamma, Alpha and Betaproteobacteria and Bacilli comprised most of the 16S rRNA sequences identified in the tomato fruit surface, while the most abundant genera included Pantoea, Enterobacter, Leuconostoc, Pseudomonas, Weissella, Sphingomonas and Burkolderia. We suggest that the high representation of Enterobacteriaceae in the tomato fruit surface might be associated with the elevated food safety risks posed by this crop.
These results represent a major contribution to the understanding of the tomato fruit surface ecology and an important step towards the establishment of science-based metrics for Good Agricultural Practices that will ensure the safety of horticultural products. The emerging role of tomato as a model organism further emphasizes the value of a deeper understanding of the interactions between this crop species, its associated microflora and the environment.
Field plots were established at the University of Maryland Wye Research and Education Center in Maryland's Eastern Shore (38°56', 76°07'). The soil was a Nassawango silt loam. Tomato transplants were planted in the field on June 9 2008 and June 10 2009. 'Sweet olive' (2008) and 'Juliet' (2009) grape tomato plants were planted on black plastic mulch and trained using stakes and a four-tier string system. The experimental design was a randomized complete block design with five blocks and three treatments. Seedlings were planted in paired rows (only one of them used for this study), 1.8 m apart. Each paired row was 9.0 m apart from the next set of paired rows. Within each row, each experimental unit was 9.0 m from the next. An experimental plot was composed of 3 grape tomato plants alternated with 2 'Brandywine' shipping tomato plants, which were not used for sampling (2008) or 5 grape tomato plants (2009) at an in-row spacing of 60 cm. In 2008, pesticides mixed in either ground or surface water were sprayed on: June 21, June 29, July 7, July 15, July 23, July 30, August 10 and August 30. In 2009, pesticides were sprayed on July 2, July 14, July 28, August 9, August 20 and August 30. Spray treatments were applied with a CO2-pressurized boom sprayer, using a separate sprayer manifold consisting of nozzles, hoses and a tank for each treatment. These booms were used throughout the season. Additional treatments (not used for this study) included organic managed plots (2008) and use of an additional pond as a source of surface water (2009). Standard agricultural practices for the production of shipping tomatoes in the region were used.
Sample collection and processing
Samples consisting of 6 tomato fruits were aseptically collected on September 1 2008 and August 31 2009. Fruits were systematically harvested from different locations within the experimental unit and placed in Ziploc® bags (2008) or Whirl-Pak® bags (2009) by using new gloves for each replicate and ethanol disinfection of pruning shears between samples. Samples were then transported back to the laboratory at 4°C. One hundred milliliters of sterile water were added to the bags, and samples were agitated for 1 minute by hand and then sonicated for 2 minutes. The microfloral wash was then transferred to polypropylene tubes and centrifuged at 30,000 × g overnight at 4°C. The pellet was then transferred to a microcentrifuge tube and stored at -80°C until DNA extraction was performed. Three liters of groundwater and 50 ml of surface water collected on August 31 2009 were filtered through 0.45 μm Fisherbrand® filters (Fisher Scientific, Pittsburgh, PA). Filters were aseptically divided into four microcentrifuge tubes and stored at 80°C. DNA extraction from filters and pellets was performed using the Promega Wizard DNA extraction kit (Promega, Madison, WI) in 2008, and the Zymo Research fungal/bacterial DNA extraction kit (Zymo Research, Orange, CA) in 2009.
Cloning and Sanger sequencing (2008)
PCR amplification of the 16S rRNA bacterial gene was performed using forward primer GM5F 5'-CCTACGGGAGGCAGCAG-3'  and reverse primer 907R 5'-CCCCGTCAATTCCTTTGAGTTT-3' , designed to amplify a 588 base pair long region including the variable region V3. PCR reactions were performed using TaKaRa premix (TaKaRa Shuzo Co., Shiga, Japan) in a 50 μl total volume (1 μl genomic DNA as template, 1 μl each primer, 22 μl sterile water and 25 μl TaKaRa premix). PCRs used a denaturation step at 98°C for 5 minutes, followed by 30 cycles of 94°C for 1 minute, 55°C for 1 minute, 72°C for 1 minute, with a final extension step at 72°C for 5 minutes. PCR fragments were cloned into the pGEM®-T Easy Vector (Promega) according to manufacturer's instructions. Bacterial colonies were frozen in 100 μl aliquots of Luria broth (Miller) solution with 10% glycerol in 96-well plates and shipped on dry ice to Agencourt Genomic Services, Beverly, MA, for Sanger sequencing.
454 sequencing (2009)
PCR amplification of the 16S rRNA bacterial gene was performed using forward primer Bact-8F (AGAGTTTGATCCTGGCTCAG)  and reverse primer UNI518R (ATTACCGCGGCTGCTGG) , designed to amplify a 527 base pair long region including variable regions V1, V2 and V3. The forward primer included the fusion primer A (CGTATCGCCTCCCTCGCGCCATCAG) in its 5' end. The reverse primer included the fusion primer B (CTATGCGCCTTGCCAGCCCGCTCAG) in its 5' end, followed by sample specific 10 bp barcodes. Standard PCRs were performed using AmpliTaq Gold LD™ (Applied Biosystems, Foster City, CA) in a 50 μl total volume (1 μl genomic DNA as template, 1 μM each primer, 200 μM each dNTP, 2 mM MgCl2, 0.60 units AmpliTaq Gold LD, 10 × buffer provided by manufacturer). PCRs used a denaturation step at 95°C for 5 min, followed by 30 cycles of 95°C 1 min, 55°C 1 min, 72°C 1 min, with a final extension step at 72°C for 5 min. Four independent PCR amplifications were performed for each sample. After a gel based confirmation of PCR amplification, PCR products were purified using the AMPure kit (Invitrogen, Carlsbad, CA) following manufacturer's recommendations, and quantified using a Qubit flurometer (Invitrogen). PCR products were pooled and the average fragment size was assessed on a 2100 Bioanalyzer (Agilent, Santa Clara, CA) using a DNA 7500 chip. Emulsion-based clonal amplification and sequencing on the 454 Genome Sequencer FLX-Titanium system were performed at the W. M. Keck Center for Comparative and Functional Genomics at the University of Illinois at Urbana-Champaign according to the manufacturer's instructions (454 Life Sciences, Branford, CT). The PCR products were sequenced on two regions of a 16-region 70 × 75 picotiter plate. Signal processing and base calling were performed using the bundled 454 Data Analysis Software version 2.0.00.
Initial sequence preprocessing
Recent validation studies have demonstrated several biases in analyses of 16S rRNA sequence datasets produced using 454-pyrosequencing technology . We have deposited the 454 raw data in NCBI-SRA under the accession number SRX040888. To mitigate these issues for this study, 454 sequences were processed and analyzed using the following state-of-the-art procedures.
Sequences were first selected for length and quality according to the following criteria:
≥100 nucleotides in length (not including sample-specific barcodes)
a perfect match to a sample-specific barcode
reads were trimmed at the beginning of a poor quality region - defined as a 10 bp window containing 8 bp with a Phred-score ≤ 20.
Reads meeting the above criteria underwent rigorous screening for chimeric reads (using ChimeraSlayer (http://microbiomeutil.sourceforge.net/- Broad Institute) and contaminants such as chloroplast and eukaryotic DNA using BLAST . The remaining set of high-quality 16S rRNA sequences were assigned to specific samples using multiplex barcodes incorporated during PCR amplification.
Taxonomic assignment and OTU analysis
Each read was assigned a putative taxonomic identity using the RDP Bayesian classifier  (minimum confidence of 80%) as well as a secondary assignment using BLAST against the Greengenes database by using an E value cutoff of 1e-10 and the Hugenholtz taxonomy . To describe the species-level structure of each microbial community, all sequences were clustered into operational taxonomic units (OTUs) using modules from the software package Mothur created by Pat Schloss . Specifically, unique reads were aligned to the core Greengenes 16S template alignment using NAST . Evolutionary distances were computed between all pairs of aligned sequences, which served as input to a furthest-neighbor clustering algorithm utilizing a distance threshold of 0.05 (i.e. 95% similarity). Good's coverage estimator  was computed for each sample using Mothur, which uses the following formula:
where Good's coverage of the ith sample (C i ) depends on the total number of sequences in the sample (N i ) and the number of singleton OTUs within that sample, n 1i .
Statistical comparisons between environments were made using Metastats  (with 1000 permutations) to detect differentially abundant taxonomic groups at the phylum, class, genus, and OTU levels. Unless explicitly stated in the text, we employed a p-value significance threshold of 0.05.
To perform a species-level analysis of the Enterobacteriaceae family, we created a database of 8,088 annotated 16S rRNA gene sequences from several Enterobacteriaceae species using the RDP database . This database includes 451 16S rRNA sequences from Salmonella species, 951 from E. coli or Shigella, 762 from Enterobacter, 725 from Pantoea, and various other associated genera and environmental candidates.
We then searched all sequences from our samples against this database using BLASTN with default parameters and isolated any reads matching one of the reference genes with ≥ 98% identity along ≥ 95% of its length. NAST was then used to create a multiple sequence alignment of all matching reads and a reference set of 68 Enterobacteriaceae species that spanned Salmonella, E. coli, Klebsiella, Pantoea, Enterobacter, Cronobacter, and Citrobacter. The resulting MSA was trimmed by removing columns in the alignment with a high percentage of gaps (> 20%). The trimmed MSA was imported into Arb to create a neighbor-joining phylogenetic tree, using Staphylococcus aureus as an outgroup.
Comparing alternative methodologies
To investigate the sensitivity of our major results to our particular methodology, we ran two alternate analyses employed by the CloVR virtual machine software package (http://clovr.org - Institute for Genome Sciences - University of Maryland Baltimore). These methodologies run similar analyses using Mothur  and Qiime  on a distributed cloud-computing architecture such as Amazon EC2. The high-quality dataset created after screening for contaminant and chimeras was used as input to the CloVR-16S pipeline.
- wg :
- ws :
- pg :
- ps :
surface water-sprayed phyllosphere
Sivapalasingam S, Friedman CR, Cohen L, Tauxe RV: Fresh produce: A growing cause of outbreaks of foodborne illness in the United States, 1973 through 1997. J Food Protect. 2004, 67: 2342-2353.
Gravani RB: The role of Good Agricultural Practices in produce safety. Microbial safety of fresh produce. Edited by: Fan X, Niemira BA, Doona CJ, Feeherry FE, Gravani RB. 2009, Singapore: IFT press series, 101-117.
Matthews KR: Microorganisms associated with fruits and vegetables. Microbiology of fresh produce. Edited by: Matthews KR. 2006, Washington, DC: ASM Press, 1-20.
Brandl MT: Fitness of human enteric pathogens on plants and implications for food safety. Annu Rev Phytopathol. 2006, 44: 367-392. 10.1146/annurev.phyto.44.070505.143359.
Brandl MT, Mandrell RE: Fitness of Salmonella enterica serovar Thompson in the cilantro phyllosphere. Appl Environ Microbiol. 2002, 68: 3614-3621. 10.1128/AEM.68.7.3614-3621.2002.
Yang CH, Crowley DE, Borneman J, Keen NT: Microbial phyllosphere populations are more complex than previously realized. P Natl Acad Sci USA. 2001, 98: 3889-3894. 10.1073/pnas.051633898.
Lindow SE, Brandl MT: Microbiology of the phyllosphere. Appl Environ Microbiol. 2003, 69: 1875-1883. 10.1128/AEM.69.4.1875-1883.2003.
Whipps JM, Hand P, Pink D, Bending GD: Phyllosphere microbiology with special reference to diversity and plant genotype. J Appl Microbiol. 2008, 105: 1744-1755. 10.1111/j.1365-2672.2008.03906.x.
Commodity specific food safety guidelines for the fresh tomato supply chain. [http://www.unitedfresh.org/assets/files/Tomato%20Guidelines%20July08%20FINAL.pdf]
Feare CJ, Sanders MF, Blasco R, Bishop JD: Canada goose (Branta canadensis) droppings as a potential source of pathogenic bacteria. J R Soc Promot Health. 1999, 146-155: 146-155.
Renter D, Sargeant J, Hygnstorm S, Hoffman J, Gillespie JR: Escherichia coli O157:H7 in free-ranging deer in Nebraska. J Wildl Dis 2001 37: 755-760. 2001, 37: 755-760.
Gerba CP: The role of water and water testing in produce safety. Microbial safety of fresh produce. Edited by: Fan X, Niemira BA, Doona CJ, Feeherry FE, Gravani RB. 2009, Singapore: Willey-Blackwell, 129-142.
Gerba CP, Choi CY, BE Goyal S: Role of irrigation water in crop contamination by viruses. Viruses in Foods. Edited by: Goyal SM. 2006, New York: Springer, 257-263.
Burau RG, Sheikh B, Cort RP, Cooper RC, Ririe D: Reclaimed water for irrigation of vegetables eaten raw. Calif Agric. 1987, 4-7.
Ibekwe A, Grieve C: Changes in developing plant microbial community structure as affected by contaminated water. FEMS microbiology ecology. 2004, 48: 239-248. 10.1016/j.femsec.2004.01.012.
Lambais MR, Crowley DE, Cury JC, Bull RC, Rodrigues RR: Bacterial diversity in tree canopies of the Atlantic forest. Science. 2006, 312: 1917-1917. 10.1126/science.1124696.
Ottesen AR, White JR, Skaltsas DN, Newell MJ, Walsh CS: Impact of organic and conventional management on the phyllosphere microbial ecology of an apple crop. J Food Protect. 2009, 72: 2321-2325.
Redford AJ, Fierer N: Bacterial succession on the leaf surface: a novel system for studying successional dynamics. Microb Ecol. 2009, 58: 189-198. 10.1007/s00248-009-9495-y.
Acosta-Martinez V, Dowd S, Sun Y, Allen V: Tag-encoded pyrosequencing analysis of bacterial diversity in a single soil type as affected by management and land use. Soil Biol Biochem. 2008, 40: 2762-2770. 10.1016/j.soilbio.2008.07.022.
Andersson AF, Lindberg M, Jakobsson H, Backhed F, Nyren P, Engstrand L: Comparative Analysis of Human Gut Microbiota by Barcoded Pyrosequencing. PLoS One. 2008, 3: e2836-10.1371/journal.pone.0002836.
Dowd SE, Callaway TR, Wolcott RD, Sun Y, McKeehan T, Hagevoort RG, Edrington TS: Evaluation of the bacterial diversity in the feces of cattle using 16S rDNA bacterial tag-encoded FLX amplicon pyrosequencing (bTEFAP). BMC Microbiol. 2008, 8: 125-10.1186/1471-2180-8-125.
Dowd SF, Sun Y, Wolcott RD, Domingo A, Carroll JA: Bacterial tag-encoded FLX amplicon pyrosequencing (bTEFAP) for microbiome studies: Bacterial diversity in the ileum of newly weaned Salmonella-infected pigs. Foodborne Pathog Dis. 2008, 5: 459-472. 10.1089/fpd.2008.0107.
Fierer N, Hamady M, Lauber CL, Knight R: The influence of sex, handedness, and washing on the diversity of hand surface bacteria. P Natl Acad Sci USA. 2008, 105: 17994-17999. 10.1073/pnas.0807920105.
Jones RT, Robeson MS, Lauber CL, Hamady M, Knight R, Fierer N: A comprehensive survey of soil acidobacterial diversity using pyrosequencing and clone library analyses. ISME J. 2009, 3: 442-453. 10.1038/ismej.2008.127.
Miller SR, Strong AL, Jones KL, Ungerer MC: Bar-Coded Pyrosequencing Reveals Shared Bacterial Community Properties along the Temperature Gradients of Two Alkaline Hot Springs in Yellowstone National Park. Appl Environ Microbiol. 2009, 75: 4565-4572. 10.1128/AEM.02792-08.
Sogin ML, Morrison HG, Huber JA, Mark Welch D, Huse SM, Neal PR, Arrieta JM, Herndl GJ: Microbial diversity in the deep sea and the underexplored "rare biosphere". P Natl Acad Sci USA. 2006, 103: 12115-12120. 10.1073/pnas.0605127103.
Redford AJ, Bowers RM, Knight R, Linhart Y, Fierer N: The ecology of the phyllosphere: geographic and phylogenetic variability in the distribution of bacteria on tree leaves. Environ Microbiol. 2010, 12 (11): 2885-93. 10.1111/j.1462-2920.2010.02258.x.
White JR, Nagarajan N, Pop M: Statistical methods for detecting differentially abundant features in clinical metagenomic samples. PLoS Comput Biol. 2009, 5: e1000352-10.1371/journal.pcbi.1000352.
Benjamini Y, Hochberg Y: Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological). 1995, 57: 289-300.
Schloss PD, Westcott SL, Ryabin T, Hall JR, Hartmann M, Hollister EB, Lesniewski RA, Oakley BB, Parks DH, Robinson CJ, et al: Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol. 2009, 75: 7537-7541. 10.1128/AEM.01541-09.
Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, et al: QIIME allows analysis of high-throughput community sequencing data. Nat Methods. 2010, 7: 335-336. 10.1038/nmeth.f.303.
Enya J, H S, Yoshida S ea: Culturable leaf-associated bacteria on tomato plants and their potential as biological control agents. Microb Ecol. 2007, 53: 524-536. 10.1007/s00248-006-9085-1.
Sajur SA, Saguir FM, Nadra MCMd: Effect of dominant specie of lactic acid bacteria from tomato on natural microflora development in tomato puree. Food Control. 2007, 18: 594-600. 10.1016/j.foodcont.2006.02.006.
Guan TTY, Blank G, Holley RA: Survival of pathogenic bacteria in pesticide solutions and on treated tomato plants. J Food Protect. 2005, 68: 296-304.
Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy AC, Rigoutsos I, Salamov A, Korzeniewski F, Land M, et al: Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods. 2007, 4: 495-500. 10.1038/nmeth1043.
White JR, Navlakha S, Nagarajan N, Ghodsi MR, Kingsford C, Pop M: Alignment and clustering of phylogenetic markers--implications for microbial diversity studies. BMC Bioinformatics. 2010, 11: 152-10.1186/1471-2105-11-152.
Wong KM, Suchard MA, Huelsenbeck JP: Alignment uncertainty and genomic analysis. Science. 2008, 319: 473-476. 10.1126/science.1151532.
Quince C, Lanzen A, Curtis T, Davenport RJ, Hall N, Read L, Sloan W: Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Methods. 2009, 6: 639-641. 10.1038/nmeth.1361.
Kunin V, Engelbrektson A, Ochman H, Hugenholtz P: Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol. 2010, 12: 118-123. 10.1111/j.1462-2920.2009.02051.x.
Muyzer G, Teske A, Wirsen CO, Jannasch HW: Phylogenetic relationships of Thiomicrospira species and their identification in deep-sea hydrothermal vent samples by denaturing gradient gel electrophoresis of 16S rDNA fragments. Arch Microbiol. 1995, 164: 165-172. 10.1007/BF02529967.
Teske A, Wawer C, Muyzer G, Ramsing NB: Distribution of sulfate-reducing bacteria in a stratified fjord (Mariager fjord, Denmark) as evaluated by most-probable-number counts and denaturing gradient gel electrophoresis of PCR-amplified ribosomal DNA fragments. Appl Environ Microbiol. 1996, 62: 1405-1415.
Gill SR, Pop M, DeBoy RT, Eckburg PB, Turnbaugh PJ, Samuel BS, Gordon JI, Relman DA, Fraser-Liggett CM, Nelson KE: Metagenomic analysis of the human distal gut microbiome. Science. 2006, 312: 1355-1359. 10.1126/science.1124234.
Turnbaugh PJ, Quince C, Faith JJ, McHardy AC, Yatsunenko T, Niazi F, Affourtit J, Egholm M, Henrissat B, Knight R, Gordon JI: Organismal, genetic, and transcriptional variation in the deeply sequenced gut microbiomes of identical twins. P Natl Acad Sci USA. 2010, 107: 7503-7508. 10.1073/pnas.1002355107.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.
Wang Q, Garrity GM, Tiedje JM, Cole JR: Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007, 73: 5261-5267. 10.1128/AEM.00062-07.
DeSantis TZ, Hugenholtz P, Keller K, Brodie EL, Larsen N, Piceno YM, Phan R, Andersen GL: NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res. 2006, 34: W394-399. 10.1093/nar/gkl244.
Good IJ: The Population Frequencies of Species and the Estimation of Population Parameters. Biometrika. 1953, 40: 237-264.
Cole JR, Chai B, Farris RJ, Wang Q, Kulam SA, McGarrell DM, Garrity GM, Tiedje JM: The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res. 2005, 33: D294-296.
Authors are indebted to Michael Newell and the farm crew at Wye Research and Education Center for their assistance with the tomato field research plots. This work was supported by JIFSAN (Joint Institute of Food Safety and Applied Nutrition) through their competitive grant program.
AT: conceived of the study, participated in its design and coordination, carried out field work and molecular biology experiments and drafted the manuscript, JRW: performed bioinformatics analyses and drafted the manuscript, DMP: participated in the study's design and coordination, carried out field and laboratory work and edited the manuscript, ARO: conceived of the study and edited the manuscript, CSW: conceived of the study, edited the manuscript and received the majority of funding needed to complete the research. All authors read and approved the final manuscript.
Adriana Telias, James R White contributed equally to this work.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.