Our assays show clear distinctions within and among B. abortus
B. melitensis, and B. suis. Our CUMA assays targeted clade-specific SNPs that can be incorporated into most other genotyping assays such as TaqMan Real-time PCR for increased sensitivity [18, 19]. We have identified several important targets that should prove useful for clinical, epidemiological, and forensic purposes. For example, the assays targeting branches A, D, and I are specific to isolates closely related to B. abortus 2308 and B. abortus 9–941, and B. suis 1330, respectively. The assays for F and G target the same branch and identify B. melitensis 16 M and closely related isolates. Isolates from B. abortus 2308 and 9–941, B. suis 1330, and B. melitensis 16 M are from common, genetically monomorphic clades of Brucella and the SNP assays developed here are a reliable and useful way of identifying these four common groups.
Branch E is particularly interesting in terms of Brucella taxonomy. The clade that this branch defines includes isolates from B. abortus biovars 1, 2, and 4. Potential issues with biovar and phylogenetic correspondence in B. abortus have been noted previously . Upon closer evaluation of the whole genomes used in our analyses, the apparent paraphyly within B. abortus biovar 1, since isolates from biovar 2 are within the biovar 1 clade, does not hold true when all the genomes are included. However, CUMA assays indicate that at least four isolates from other B. abortus biovars (3 of biovar 4, 1 of biovar 2) fall onto the B/C branch. This would suggest that either biovar 1 is paraphyletic or there have been issues with biovar determination.
SNP-based approaches also enable assessment of errors in genome sequences. Whole genome comparisons of the region associated with SNP10621, which were intended to target branch J in B. suis/ B. canis, also share a SNP allele with B. abortus 9–941. Taken at face value, this would suggest homoplasy at this locus. Yet, in our CUMA assays B. abortus 9–941 did not group with B. suis, likely indicating sequencing error.
Finding nucleotide polymorphisms that differentiate clades, species, or isolates is dependent on the genomes used for SNP discovery. In general, one will only find those SNPs that exist among the genomic samples used in the comparisons and novel SNPs will remain undiscovered . This discovery bias can strongly affect taxonomic interpretation of results [22, 23]. Although discovery bias is often less consequential for genotyping efforts, the effects of our choice of strains for SNP discovery are clearly apparent in our phylogenetic tree. The discovery strains are distinguished by their positions at terminal branches in the phylogeny. There is greater diversity observed in B. abortus simply because two strains were part of the discovery panel. Furthermore, although isolates on a branch will be grouped by the SNPs they share (or do not share), additional structure exists in the “true” phylogeny that is not apparent in the genotype tree. Branch lengths are also highly affected by the SNP discovery process. Species that are basal within this phylogeny, such as B. ceti
B. ovis, and B. neotomae have short branch lengths merely because these genomes were not part of SNP discovery. It must also be noted that B. suis biovar 5 is part of this basal group. SNPs that should group it with the rest of the B. suis clade were not present in our MIP assay, which is not surprising since this branch is extremely short, even with whole genome analysis [JTF unpubl. data, . We did not observe differentiation of these and the other Brucella species, nor did we expect it because genomes from these groups were not a part of SNP discovery.
Whole genome resequencing at the Broad Institute of MIT/Harvard recently generated genomes for over 100 additional Brucella strains and these genomes should provide a broad basis for future genotyping efforts, with canonical SNPs developed for each of the important isolates and clades. Future genotyping efforts should include SNPs from all of the recognized species and biovars. Comparative work using some of these genomes has already been fruitful, demonstrating the emergence of the marine Brucella from within the terrestrial Brucella and showing a methodology for whole genome analysis .
A trade-off exists in current genotyping efforts between throughput and genomic sampling. Does one aim for a maximum amount of potentially informative loci through approaches such as whole genome sequencing but having to sacrifice the number of isolates that can be evaluated? Or does one aim for more complete sampling of large numbers of isolates but with a limited set of loci using individual SNP assays such as CUMA? Of course the ultimate answer depends on your research interest or clinical application as well as the amount of resources at hand. MIP assays provide phylogenetic resolution for an intermediate number of samples and intermediate number of SNPs. Nonetheless, MIP assays, or any assays based on previously discovered SNPs, will always have their inference limited by the genomes used in SNP discovery . MIP assays do however allow for a focus on resolving branches of specific interest. Data from these assays then allows for targeted down selection of loci so that focal branches and isolates on them can be thoroughly interrogated using individual SNP assays. Identifying canonical SNPs and verifying their ability to differentiate clades by screening large numbers of isolates is the essential part of genotyping . Less important is the type of assay used for SNP differentiation because it is highly dependent on the numbers of SNPs and samples one wants to screen. The MIP and CUMA SNP screening techniques are just two of many methods that can be used for SNP genotyping in Brucella and other bacteria.