Defining bacterial species in the genomic era: insights from the genus Acinetobacter
© Chan et al.; licensee BioMed Central Ltd. 2012
Received: 31 July 2012
Accepted: 18 December 2012
Published: 23 December 2012
Skip to main content
© Chan et al.; licensee BioMed Central Ltd. 2012
Received: 31 July 2012
Accepted: 18 December 2012
Published: 23 December 2012
Microbial taxonomy remains a conservative discipline, relying on phenotypic information derived from growth in pure culture and techniques that are time-consuming and difficult to standardize, particularly when compared to the ease of modern high-throughput genome sequencing. Here, drawing on the genus Acinetobacter as a test case, we examine whether bacterial taxonomy could abandon phenotypic approaches and DNA-DNA hybridization and, instead, rely exclusively on analyses of genome sequence data.
In pursuit of this goal, we generated a set of thirteen new draft genome sequences, representing ten species, combined them with other publically available genome sequences and analyzed these 38 strains belonging to the genus. We found that analyses based on 16S rRNA gene sequences were not capable of delineating accepted species. However, a core genome phylogenetic tree proved consistent with the currently accepted taxonomy of the genus, while also identifying three misclassifications of strains in collections or databases. Among rapid distance-based methods, we found average-nucleotide identity (ANI) analyses delivered results consistent with traditional and phylogenetic classifications, whereas gene content based approaches appear to be too strongly influenced by the effects of horizontal gene transfer to agree with previously accepted species.
We believe a combination of core genome phylogenetic analysis and ANI provides an appropriate method for bacterial species delineation, whereby bacterial species are defined as monophyletic groups of isolates with genomes that exhibit at least 95% pair-wise ANI. The proposed method is backwards compatible; it provides a scalable and uniform approach that works for both culturable and non-culturable species; is faster and cheaper than traditional taxonomic methods; is easily replicable and transferable among research institutions; and lastly, falls in line with Darwin’s vision of classification becoming, as far as is possible, genealogical.
In the early eighteenth century, Linnaeus provided the first workable hierarchical classification of species, based on the clustering of organisms according to their phenotypic characteristics . In The Origin of Species, Darwin added phylogeny to taxonomy, while also emphasizing the arbitrary nature of biological species: “I look at the term species as one arbitrarily given for the sake of convenience to a set of individuals resembling each other.” The reality and utility of the species concept continues to inform the theory and practice of biology and a stable species nomenclature underpins the diagnosis and monitoring of pathogenic microorganisms [3–5].
Traditional taxonomic analyses of plants and animals rely on morphological characteristics. However, this approach cannot easily be applied to unicellular microorganisms. In the latter half of the twentieth century, it became clear that bacteria could be grouped into taxonomic clusters based on stable phenotypic characters (e.g. cellular morphology and composition, growth requirements and other metabolic traits) that could be measured reliably in the laboratory. In the 1960s and 1970s, Sneath and Sokal exploited improved technical and statistical methods to develop a numerical taxonomy, which revealed discrete phenotypic clustering within many bacterial genera .
Such phenotypic approaches soon faced competition from genotypic approaches, such as DNA base composition (mol% G+C content)  and whole-genome DNA-DNA hybridization (DDH); the latter remains the gold standard in bacterial taxonomy . Within this framework, Wayne et al. recommended that “a species generally would include strains with approximately 70% or greater DNA-DNA relatedness”. However, few laboratories now perform DNA-DNA hybridization assays as these are onerous and technically demanding when compared to the rapid and easy sequencing of small signature sequences, such as the 16S ribosomal RNA gene. This shift has led to an updated species definition: “a prokaryotic species is considered to be a group of strains that are characterized by a certain degree of phenotypic consistency, showing 70% of DNA–DNA binding and over 97% of 16S ribosomal RNA (rRNA) gene-sequence identity” .
Most recently, whole-genome sequencing has delivered new taxonomic metrics—for example, average nucleotide identity (ANI), calculated from pair-wise comparisons of all sequences shared between any two strains. ANI exhibits a strong correlation with DDH values , with an ANI value of ≥ 95% corresponding to the traditional 70% DDH threshold .
Despite the ready availability of genome sequence data, microbial taxonomy remains a conservative discipline. When defining a bacterial species, most modern microbial taxonomists use a polyphasic approach, whereby a bacterial species represents “a monophyletic and genomically coherent cluster of individual organisms that show a high degree of overall similarity with respect to many independent characteristics, and is diagnosable by a discriminative phenotypic property” . Although the polyphasic approach is pragmatic and widely applicable, it has drawbacks. It relies on phenotypic information, which in turn relies on growth, usually in pure culture, in the laboratory, which may not be achievable for many bacterial species . It also relies on techniques that are time-consuming and difficult to standardize, particularly when compared to the ease of modern genome sequencing [4, 13, 14].
We, like others, are therefore driven to consider whether, in the genomic era, bacterial taxonomy could, and should, abandon phenotypic approaches and rely exclusively on analyses of genome sequence data [4, 10, 14–18]. However, such an approach brings fresh conceptual and methodological challenges. Several forces shape the evolution of bacterial genomes: the steady accumulation of point mutations or small insertions/deletions (indels), potentially giving rise to a tree-like phylogeny; the influence of homologous recombination in some lineages, obscuring such diversification; and the key role of gene gain/loss, particularly the pervasive influence of horizontal gene transfer, which, if substantial, could obliterate phylogenetic signals. These forces act with different strength on different parts of the genome and on different bacterial lineages. For example, sequences from a single gene such as the 16S rRNA gene have been shown to fail to capture the true genome-wide divergence between two strains [19–21]. Additionally, it may be expected that the various novel sequence-based metrics would be affected differently by different evolutionary forces. This raises potential problems with the consistency of classification (results may or may not be consistent across the metrics) and backwards compatibility (classification may or may not correspond to already named species within a genus). In this work, we wished to explore these issues on a well-characterized and important bacterial genus, Acinetobacter.
The genus Acinetobacter was first proposed by Brisou and Prévot in 1954 ; however, it was not until Baumann et al. published their comprehensive study based on nutritional and biochemical properties that this designation became more widely accepted. In 1974 the genus was listed in Bergey’s Manual of Systematic Bacteriology with the description of a single species, A. calcoaceticus. To date, there are 27 species described in the genus (http://www.bacterio.cict.fr/a/acinetobacter.html). To fall within genus Acinetobacter, isolates must be Gram-negative, strictly aerobic, non-fermenting, non-fastidious, non-motile, catalase-positive, oxidase-negative and have a DNA G+C content of 38-47% . Some isolates within the genus are naturally competent resulting in intra-species recombination [25–27]. Environmental isolates, such as A. calcoaceticus PHEA-2 and Acinetobacter oleivorans DR1, have attracted interest because they are able to metabolize a diverse range of compounds [28–30]. However, most research on the genus has focused on clinical isolates, particularly from the species A. baumannii. This species has shown an astonishing ability to acquire antibiotic resistance genes and some strains are now close to being untreatable [31, 32]. Worryingly, the incidence of serious infections caused by other Acinetobacter species is also increasing . Genotypic approaches have suggested that A. baumannii forms a complex—the A. baumannii/calcoaceticus or ACB complex—with three other species A. calcoaceticus, A. nosocomialis and A. pittii. However, it remains very difficult, if not impossible, for a conventional reference laboratory to distinguish these species on phenotypic grounds alone . Techniques such as AFLP and amplified 16S rRNA gene restriction analysis (ARDRA) can be used to identify species within the Acinetobacter genus and the ACB complex [35–38]; however, these techniques are too laborious to be carried out in a routine laboratory .
Given the general difficulty in defining bacterial species and the ready availability of genome sequence data, we sought to evaluate a range of novel genotypic and genome-based metrics for species delineation. In light of discussed obstacles and the on-going public health concern, we believe that genus Acinetobacter provides a timely test case to evaluate the validity and robustness of these sequence-based approaches. In pursuit of this goal, we generated a diverse and informative set of thirteen new draft genome sequences, representing ten species, and we analyzed the whole-genome sequences from a total of 38 strains belonging to the genus.
Genome sizes, sequencing statistics, G+C content, number of CDSs in the thirteen sequenced Acinetobacter isolates
Genome size (Mb)
No. of contigs
G+C content (%)
No. of predicted good quality CDSs†
GenBank accession number
DSM 16617 (T)
DSM 6976 (T)
NCTC 5866 (T)
DSM 16037 (T)
DSM 21653 (T)
DSM 30006 (T)
LMG 1003 (T)
The species A. ursingii was first described by Nemec et al. in 2001 . We have genome sequenced the type strain DSM 16037, which was isolated from a blood culture taken from an inpatient in Prague, Czech Republic in 1993 . In the genome we identified 3252 good-quality CDSs (minimum length 50 codons of which less than 2% are stop codons); 270 of these do not have homologs in any of the other 37 Acinetobacter strains in this study. Depth of coverage was generally consistent, apart from two contigs which showed 3.5 times greater-than-average coverage. Scrutiny of the larger of these two contigs (9.4 kb) identified CDSs that are predicted to encode plasmid replication and mobilization proteins. This contig also contains homologs of sul1 and uspA genes, which are often associated with A. baumannii resistance islands .
A. lwoffii was first described by Audureau in 1940 under the name Moraxella lwoffii, but was later moved to genus Acinetobacter by Baumann et al.. In 1986, Bouvet and Grimont emended the description of the species to designate strain NCTC 5866 the type strain . We identified 3005 good-quality CDSs in the NCTC 5866 genome, of which 229 do not have homologs in any of the Acinetobacter genomes examined in this study. Investigation of these CDSs revealed two putative prophages, ca. 44.5 and 25.6 kb. Interestingly, many of the CDSs found in these two putative prophages are also present in a recently sequenced environmental Acinetobacter strain P8-3-8 (not included in this study) isolated from the intestine of a blue-spotted cornetfish caught in Vietnam .
Among the remaining strain-specific CDSs, we identified fourteen that are nearly identical to tra genes found in PHH1107, a low GC content plasmid isolated from pig manure . The tra homologs are distributed on two contigs, one of which has a GC content (37%) lower than the genome mean (43%).
Strain DSM 16617 is the type strain for A. parvus isolated from the ear of an outpatient from Pribram, Czech Republic in 1996 . We identified 2681 good-quality CDSs in the DSM 16617 genome, 179 of which do not have homologs in any of the remaining 37 genomes. Analysis with Prophinder  identified one 39kb putative prophage containing phage-related genes homologs to putative phage-related genes found in A. baumannii and A. oleivorans DR1. We identified an 8kb contig with 2.5 times higher than average depth of coverage, which contains homologs to phage related genes.
Strain LMG 1003 is the type strain for A. bereziniae, a recently named species by Nemec et al., which has been isolated from various human, animal and environmental sources . We identified 4480 good-quality CDSs in the genome, with 1061 strain-specific CDSs (no homologs in the rest of the 37 genomes). This is a considerably higher percentage, 24%, than in other Acinetobacter strains (see Additional file 1). Many of the strain-specific CDSs form clusters of four or more CDSs, with the largest cluster containing 49 consecutive CDSs, of which 45 are strain-specific. Twenty-one CDSs in this cluster have no significant similarity to proteins in the non-redundant protein database.
Depth of coverage analysis revealed several contigs with higher than average value. One such contig has 5 times greater coverage compared to the rest of the genome, which suggests it is a mobile element. It contains a CDS homologous to the sul1 gene often found in A. baumannii resistance islands .
A. radioresistens strain DSM 6976 was isolated in 1979 from cotton sterilized by γ-radiation and is the type strain for the species . We identified 2964 good-quality CDSs in the genome, of which 188 do not have homologs in any of the remaining 37 genomes.
A comparison with two previously sequenced A. radioresistens, SK82 and SH164, reveals that the three strains share 2458 CDSs (about 83% of the average number of CDSs in these three strains), 43 of which were not found in the remaining 35 Acinetobacter genomes. Among these there is a homolog of the metE gene, and two genes involved in the degradation of benzoate, an aromatic compound which is known to support the growth of a number of A. radioresistens. Though the three strains are quite similar, we identified 143 CDSs in DSM 6976 which are absent in SK82 and SH164, but do have homologs in other Acinetobacter genomes. Within this group there is a genomic island containing nine genes related to fructose metabolism and a cluster of four CDSs predicted to encode for type IV pilin proteins.
This core genome tree generally supports the monophyletic status of the named species within the genus, with three exceptions: A. baumannii NCTC 7422 belongs in a deep-branching lineage with the A. parvus type strain DSM 16617, A. nosocomialis NCTC 10304 clusters within A. baumannii and A. calcoaceticus PHEA-2 is closer to the three A. pittii strains than to the other two A. calcoaceticus strains. The first two strains have been genome-sequenced as part of this study and our results suggest they have been misclassified in the culture collection. PHEA-2 is an isolate from industrial wastewater that was genome-sequenced by Xu et al.. Our core genome tree and comparisons of 16S rRNA gene sequences show PHEA-2 to be closer to the three A. pittii strains than to the other two A. calcoaceticus strains, suggesting it too has been misclassified. Interestingly, the previously unclassified strain DR1 sits closest to the two A. calcoaceticus strains, while ATCC 27244 is closest to the species A. haemolyticus.
Once such reclassifications are taken into account, our core genome phylogenetic tree is consistent with the currently accepted genus taxonomy and also supports the monophyly of the ACB complex and of each of its four constituent species. Within A. baumannii, two lineages, international clones I and II, previously identified by comparative cell envelope protein profiling, ribotyping and AFLP genomic fingerprinting  are present as monophyletic groups in our tree. The tree obtained from the core genome is similar to a tree obtained from a recently described approach based on 42 ribosomal genes  (see Additional file 3).
Phylogenetic approaches are processor-intensive. We therefore evaluated genetic relatedness among the 38 strains using three rapid distance-based oligonucleotide and gene content approaches that avoid time-consuming calculations: the previously mentioned ANI, as well as K-string  and genome fluidity  approaches.
The K-string composition approach  is based on oligopeptide content analysis of predicted proteomes. The divergence dendogram for K=5 (see Additional file 4) generally agrees with the results from the phylogenetic tree and ANI dendogram at species level. However, the major problem is that the K-string approach places A. baumannii SDF outside the ACB complex, probably reflecting the considerable difference in gene repertoires between this drug-sensitive strain and all other genome-sequenced A. baumannii strains.
Genome fluidity provides a measure of the dissimilarity of genomes evaluated at the gene level . A dendogram based on genomic fluidity (see Additional file 5) significantly differs from the results obtained with other techniques: A. baumannii SDF again sits outside the ACB complex, A. nosocomialis strains NCTC 8102 and RUH2624 now sit within the A. baumannii clade and PHEA-2 sits not with the A. pittii strains but with DR1 and the other A. calcoaceticus strains. We also performed pair-wise comparison of the gene content of the 38 strains, calculating the amount of the CDSs shared by each pair of strains (see Additional file 6). While strains from the same species generally share at least 80% of their CDSs, we found strains from different species exhibiting similar ratios. For example, A. calcoaceticus RUH2202 shares more than 80% of its CDS repertoire with DR1 and various A. nosocomialis, A. baumannii, A. pittii strains; PHEA-2 and DR1 share 88.1% of their CDSs. Based on gene content only, A. baumannii SDF is distinct from all other A. baumannii strains in our study (sharing at most 71.6% of its CDSs), which explains its placement in the K-string and genomic fluidity dendograms (see Additional files 4 and 5, respectively). These results indicate a potentially significant level of horizontal gene transfer among Acinetobacter species and illustrate an inability to delineate species based on gene content comparison only.
These findings suggest that ANI analyses provide results that are compatible with traditional and phylogenetic classifications, whereas K-string and genome fluidity approaches appear to be too strongly influenced by the effects of horizontal gene transfer to be consistent with previously accepted approaches.
The congruence of the phylogenetic tree and ANI dendogram with each other and with existing species definitions provides confidence that these techniques are fit for purpose in delineating species in the absence of phenotypic data. Furthermore, as Goris et al. suggest, the ANI approach provides a handy numerical cut-off at 95% identity to demarcate species boundaries, which corresponds to the 70% DDH value . When we applied this cut-off to our dataset, we were able to classify 37 of the strains into thirteen previously named species.
In line with the likely misclassification of strains, we observed that A. nosocomialis NCTC 10304 shares phylogenetic history and exhibits pair-wise ANI values greater than 95% with all 14 sequenced A. baumannii strains, thus confirming it should be designated A. baumannii NCTC 10304. Similar arguments apply for A. calcoaceticus PHEA-2 (new designation A. pittii PHEA-2) and A. sp. ATCC 27244 (A. haemolyticus ATCC 27244). However, the strain NCTC 7422 appears to be distinctive enough to represent new species. While the traditional polyphasic approach to taxonomy demands additional phenotypic characterization before these species can be named, on the basis of the analyses presented here, we propose the species name Acinetobacter bruijnii sp. nov. (N. L. gen. masc. n. bruijnii, of Bruijnius, named after Nicolaas Govert de Bruijn, Dutch mathematician) for strain NCTC 7422 and all future strains that are monophyletic and show ≥ 95% ANI to this strain.
It is interesting to note that our results based on core genome and ANI analyses differ from those based on AFLP patterns ; notably in the latter A. haemolyticus and A. junii do not cluster together nor does the cluster form a sister branch to the ACB complex; also A. johnsonii does not appear on the same deep-branch as A. lwoffii. This observation suggests that although AFLP is adept at species resolution, it appears to be unsuitable for phylogenetic analysis.
Several recent studies report alternative genomic approaches to bacterial taxonomy and species identification. These include in silico multilocus sequence analysis (MLSA), average amino acid identity (AAI) and ribosomal multilocus sequence typing (rMLST), which have been used to delineate species in the genera Neisseria, Vibrio and Mycoplasma[17, 18, 57]. Although MLSA can be used to infer phylogeny, this approach suffers from arbitrariness in choice of in genes which varies from one taxon to the next. Our proposed approach, core-genome phylogeny, can be considered an extension of MLSA and rMLST. However, as it is based on all shared CDSs in a given genus, it makes use of all potentially informative sequence sites. ANI, like AAI, measures pair-wise similarities between genome sequences but provides better resolution of species and sub-species [58, 59].
The aim of this study has been to determine, using the genus Acinetobacter as a test case, whether genome sequence data alone are sufficient for the delineation and even definition of bacterial species. To this end, we explored the applicability of two broad approaches: sequence-based phylogenies for single and multiple gene and distance-based methods that include gene content comparisons (K-string and genomic fluidity) and whole-genome sequence similarities (ANI). We have found that a phylogenetic analysis of the genus Acinetobacter based on 16S rRNA gene sequences provides unreliable and uninformative results. By contrast, a core genome phylogenetic tree provides robust, informative results that are backwards compatible with the existing taxonomy.
Among the distance metrics, we found that approaches using gene content (K-string and genomic fluidity) led to anomalous conclusions, e.g., placing the SDF strain outside of the A. baumannii cluster, presumably because they are affected by horizontal gene transfer. In contrast, the easy-to-compute ANI results are congruent with the core genome phylogeny and traditional approaches. Using the core genome phylogeny and ANI approach, we found three misclassifications, one of which represents new species. These findings illustrate the need to genome-sequence all strains archived in culture collections, which is likely to become technically and economically feasible in the near future.
We believe a combination of core genome phylogenetic analysis and ANI provides a feasible method for bacterial species delineation, in which species are defined as monophyletic groups of isolates that exhibit at least 95% pair-wise ANI to each other. This approach combines a theoretically rigorous approach (sequence phylogeny) with a pragmatic metric (ANI) that provides a numerical cut-off that is backwards compatible and has been shown to be applicable to a diverse group of bacteria [10, 60].
Our sequence-based approach has several desirable characteristics. Firstly, it is capable of resolving the inconsistency in classification of genomospecies. For example, our results confirm the recent assignment of genomospecies 3 and 13TU to Latin binomials A. pittii and A. nosocomialis, respectively. Secondly, it provides a scalable and uniform approach that works for both culturable and non-culturable species, solving the problem in classifying non-culturable organisms, in an era when whole-genome sequences of such organisms can be recovered relatively easily via metagenomics or single-cell genomics. Thirdly, our approach is faster and cheaper than traditional taxonomic methods, as well as being easily replicable and transferable among research institutions. Finally a method that combines phylogeny and pragmatism falls in line with Darwin’s vision of classification, as stated in the conclusion of Origin of Species: “Our classification will come to be, as far as they can be so made, genealogies…” .
Details of Acinetobacter strains used in this study are listed in Additional file 1. Acinetobacter baumannii W6976 and W7282 were provided by Drs. Mike Hornsey and David Wareham at Barts and The London NHS Trust, whilst the remaining strains were obtained from the UK, German and Belgium culture collections. Sequenced isolates were cultured in Nutrient broth or Tryptic soy medium at 25°C or 30°C. DNA was extracted from single colony cultures using Qiagen 100/G Genomic-tips and quantified using Quant-iT PicoGreen dsDNA kits (Invitrogen). DNA was stored at 4°C.
DNA from thirteen isolates was sequenced by 454 GS FLX pyrosequencing (Roche, Branford, CT, USA) according to the standard protocol for whole-genome shotgun sequencing, producing an average of 450bp fragment reads. Draft genomes were assembled from flowgram data using Newbler 2.5 (Roche). The resulting contigs were annotated using the automated annotation pipeline on the xBASE server . The genome sequences of the thirteen newly sequenced strains have been deposited in GenBank as whole genome shotgun projects (Table 1).
We computed the set of all orthologs within the 38 strains in our study with OrthoMCL  which performs a bidirectional best hit search in the amino-acid space, followed by a subsequent clustering step (percentMatchCutoff = 70, evalueCutoff = 1e-05, I = 1.5). Predicted are 7,334 clusters of orthologous groups (COGs) containing 124,870 coding sequences (CDSs), which represents 95.7% of all good-quality CDSs (length at least 50 codons of which less than 2% are stop codons).
Using the orthologs data, we extracted the genus core genome, i.e. the set of COGs which are present in each of the 38 strains (911 COGs). We filtered this set to exclude COGs containing paralogs and obtained a set of 827 single-copy COGs. The nucleotide gene sequences of each single-copy COG were aligned using MUSCLE 3.8.31  with default parameters and the alignments were trimmed for quality, leading and trailing blocks using GBlocks 0.91b  with default parameters. After excluding 8 COGs with trimmed length < 50 bp, we screened the remaining 819 COGs for possible evidence of recombination using the PHI , MaxChi  and Neighbour similarity score  tests implemented in PhiPack (http://www.maths.otago.ac.nz/~dbryant/software/PhiPack.tar) using 1000 permutations, window size = 50 bp and p-value < 0.05. To facilitate a more robust phylogeny construction, we selected only the 127 recombination-free COGs for which none of the three tests found evidence of recombination. The trimmed alignments of the 127 COGs were concatenated and used to build the tree by the approximately maximum-likelihood FastTree 2  with 100 bootstrap replicates (created using SEQBOOT program from the PHYLIP package . The resulting tree was visualized using FigTree (http://tree.bio.ed.ac.uk/software/figtree) and rooted at the mid-point.
The trees based on the 16S, the 819 single-copy COGs (no recombination filtering) and the 42 ribosomal genes were built in the same manner – multiple alignment of the nucleotide sequences with MUSCLE, trimming with GBlocks, and constructing bootstrapped trees (100 replicates) with FastTree 2, rooting them at mid-point.
The ANI analysis was based on whole-genome data using the method proposed by Goris et al.. Briefly, for each genome pair, one of the genomes was chosen as a query and split into consecutive 500 bp fragments. These were then used to interrogate the second genome, designated the reference, using BLASTn  (X = 150, q = -1 F= F). For each query, the hit with the highest bit-score was selected and if the alignment exhibited at least 70% identity and over 70% of the query fragment length, the hit was retained for further evaluation. The ANI score was computed as the mean identity of the retained hits. Based on the pair-wise ANI values, we compiled a distance matrix to represent the ANI divergence (which is defined as 100% - ANI) between the strains and used it to compute the ANI divergence dendogram with the hierarchical clustering package hcluster 0.2.0 adopting the complete linkage algorithm (http://pypi.python.org/pypi/hcluster).
K-string analysis was based on the method proposed by Qi et al.; for each proteome, its composition vector was computed by extracting the frequency of overlapping amino acid strings of length K and filtering out the random mutation background using a Markov model. The divergence between two genomes was computed by calculating the cosine function of the angle between the pair’s composition vectors. The dendogram based on the pair-wise K-string distances was built as for ANI. The pair-wise genomic fluidity for each pair of genomes was computed using the ortholog data as suggested by Kislyuk et al.. The dendogram was built as for ANI and K-string.
We thank Dr. Mike Hornsey and Dr. David Wareham for the kind gift of isolates A. baumannii W6976 and W7282. JZ-MC, MRH, CC and MJP were supported by Medical Research Council grant G0901717; CC was also supported by the NIHR Surgical Reconstruction and Microbiology Research Centre; MRH and NJL were supported by Biotechnology and Biological Sciences Research Council grant BBE0111791.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.