Microbes are critical symbiotes for humans, where upwards of 100 trillion foreign cells from more than 1000 different species reside [1, 2]. The gut is host to the bulk of the microflora, where bacteria are the most abundant, outnumbering eukaryotes and viruses by orders of magnitude. While a handful are known human pathogens, the majority of these bacteria, such as Lactobacillus sp. are commensal or mutualistic, exerting their influence through probiotic functions . Studies in mice and humans implicate gut bacterial influence not just in digestion of nutrients , but in fat storage , modulation of bone-mass density , angiogenesis , protection against pathogens , and immune functions [8, 9]. Conditions such as Crohn’s disease , diabetes [11, 12], and obesity [13–15] have all been directly linked to an imbalance of gut microflora. Despite an explosion of research in recent years, the ecology and mechanistic details of complex microbiomes such as those found in the gut remain enigmatic, and new methodologies for dissection and characterization are needed.
Metagenomics refers to a powerful set of genomic and bioinformatic tools used to study the diversity, function, and physiology of complex microbial populations . Substantial advances in microbiome research have been driven by the extensive use of next generation sequencing (NGS) technologies, which allow annotation and characterization of microbiomes using targeted (e.g. hypervariable regions of 16S rRNA ) or shotgun approaches . Targeted approaches are suboptimal in the identification of low abundant species , and even though identification of most species from a population is possible using shotgun sequencing, assembly of complete genomes of individual species is rarely possible unless those species are highly abundant. Moreover, as complexity increases, dataset resolution decreases, reducing the ability to comprehensively analyze community structure. Recent reports provide promising advances in metagenomic binning and assembly for the reconstruction of complete or near-complete genomes of rare (<1%) community members from metagenomes. Albertesen et al.  have described differential-coverage binning as a method for providing sample-specific genome catalogs, while Wrighton et al.  have also been successful in sequencing more than 90% of the species in microbial communities. In another approach, either GC content  or tetranucleotide frequency  combined with genome coverage patterns across different sample preparations was used to bin sequences into separate populations, which were then assembled under the assumption that nucleotide (or tetranucleotide) frequencies are constant for any specific genome. Sequencing throughput is continually improving and is expected to provide access to increasingly lower abundance populations and improvements in read length and quality will reduce the impact of co-assembly of closely related strains (strain heterogeneity) on the initial de novo assembly. While these approaches represent exciting advances in bioinformatic tools, experimental tools for reducing the complexity of a population prior to sequencing, such as enriching for low abundant organisms or intact cells, provide alternative and complementary approaches to improve genomic analysis of such complex systems .
A variety of experimental methods have been used to decrease sample complexity prior to sequencing. The most commonly used tool for decreasing sample complexity is probably single cell genomics (SCG) [23, 24] which utilizes flow cytometry, microfluidics, or micromanipulation to isolate single cells as templates for whole genome amplification by multiple displacement amplification (MDA) [25–27]. As it requires only a single template genome, it allows the sequencing of “uncultivable” organisms. For example, a recent paper from the Quake group used microfluidics to isolate single bacterial cells from a complex microbial community, using morphology as discriminant, before genome amplification and analysis . SCG approaches rely on MDA, and while MDA can generate micrograms of genomic amplicons for sequencing from a single cell, amplification bias, leading to incomplete genome coverage, is a major inherent limitation [29, 30]. In fact, a recent survey of 201 genomes sequenced from single cells had a mean coverage of approximately 40% . A clever use of single amplified genome (SAGs) assembly improved coverage to >90% for 7 of the 201 genomes, with mean coverage being approximately 70% for the 21 genomes when assembled from multiple SAGs. MDA-associated Amplification bias has been improved for eukaryotic cells using a technique called MALBAC , but these improvements have yet to be shown for prokaryotic genomes and still rely on random, or morphologically based, cell sorting. Such random sorting of single microbial cells from complex mixtures is expected to bias against rare species and may require sorting and sequencing of hundreds to thousands of cells before a rare genome can be obtained.
Increased input template number can overcome MDA amplification bias, or difficulties in processing and sorting single cells from biofilms, and provide near complete genome coverage. Potential methods for accomplishing this include inducing artificial polyploidy or using gel microdroplets [24, 33]. However, in both of these cases, rare species may still be missed if sufficient numbers of single cells cannot be sorted. This has been partially addressed in a recently published “mini-metagenomics” approach. MDA product coverage was improved by creating bacterial pools by flow cytometry, with ~100 bacteria in each pool. Screening of these pools for 16S rDNA sequences of the bacterial species of interest, followed by deep sequencing of the positive pools, allowed assembly of a relatively complete genome from different pools containing the same 16S RNA sequences .
An alternative approach to simultaneously address both amplification bias and isolate rare species is to use antibodies recognizing specific microorganisms within microbial communities to enrich and/or subtract bacterial species prior to sequencing. We hypothesized that enrichment by selective sorting in this way could provide a powerful method for significantly increasing input template number to obtain complete genomes of low abundance species, akin to creating a small microbiome in which all members expressed a single target recognized by the antibody of interest.
In the present work, we developed a selection and screening pipeline using phage display and flow cytometry to isolate a single chain Fv (scFv) antibody that can: i) identify a bacterial species, Lactobacillus acidophilus, with extreme specificity; and ii) be applied to a microbiome, using fluorescence activated cell sorting (FACS), to identify, enrich, and deplete targeted species from bacterial mixtures. We further demonstrated that if this approach was applied to a mock community containing L. acidophilus, rather than the pure single species, antibodies recognizing L. acidophilus could be isolated. This phage display selection method is highly adaptable to recognition of any organism and provides a unique tool for dissection and sequencing of rare species from complex microbiomes.