- Research article
- Open Access
Comprehensive annotation of secondary metabolite biosynthetic genes and gene clusters of Aspergillus nidulans, A. fumigatus, A. niger and A. oryzae
BMC Microbiologyvolume 13, Article number: 91 (2013)
Secondary metabolite production, a hallmark of filamentous fungi, is an expanding area of research for the Aspergilli. These compounds are potent chemicals, ranging from deadly toxins to therapeutic antibiotics to potential anti-cancer drugs. The genome sequences for multiple Aspergilli have been determined, and provide a wealth of predictive information about secondary metabolite production. Sequence analysis and gene overexpression strategies have enabled the discovery of novel secondary metabolites and the genes involved in their biosynthesis. The Aspergillus Genome Database (AspGD) provides a central repository for gene annotation and protein information for Aspergillus species. These annotations include Gene Ontology (GO) terms, phenotype data, gene names and descriptions and they are crucial for interpreting both small- and large-scale data and for aiding in the design of new experiments that further Aspergillus research.
We have manually curated Biological Process GO annotations for all genes in AspGD with recorded functions in secondary metabolite production, adding new GO terms that specifically describe each secondary metabolite. We then leveraged these new annotations to predict roles in secondary metabolism for genes lacking experimental characterization. As a starting point for manually annotating Aspergillus secondary metabolite gene clusters, we used antiSMASH (antibiotics and Secondary Metabolite Analysis SHell) and SMURF (Secondary Metabolite Unknown Regions Finder) algorithms to identify potential clusters in A. nidulans, A. fumigatus, A. niger and A. oryzae, which we subsequently refined through manual curation.
This set of 266 manually curated secondary metabolite gene clusters will facilitate the investigation of novel Aspergillus secondary metabolites.
Secondary metabolites produced by fungi are a rich source of medically useful compounds because of their pharmaceutical and toxicological properties . While secondary metabolites are not required for an organism’s growth or primary metabolism, they may provide important benefits in its environmental niche. For example, A. nidulans laeA mutants defective in the production of secondary metabolites are ingested more readily by the fungivorous arthropod, Folsomia candida, suggesting that secondary metabolite production can protect fungi from predation .
The Aspergilli are producers of a wide variety of secondary metabolites of considerable medical, industrial, agricultural and economic importance. For example, the antibiotic penicillin is produced by A. nidulans and the genes involved in the penicillin biosynthetic pathway have been extensively studied [3–5]. Sterigmatocystin (ST), an aflatoxin (AF) precursor, and many of the genes that are involved in its biosynthesis have also been extensively studied in A. nidulans[6–10]. AF is a secondary metabolite produced mainly by Aspergillus species growing in foodstuffs , and it is of both medical and economic importance as contaminated food sources are toxic to humans and animals when ingested. Gliotoxin is an extremely toxic secondary metabolite produced by several Aspergillus species during infection [12, 13]. The ability of this toxin to modulate the host immune system and induce apoptosis in a variety of cell-types has been most studied in the ubiquitous fungal pathogen, A. fumigatus[14, 15].
The availability of Aspergillus genomic sequences has greatly facilitated the identification of numerous genes involved in the production of other secondary metabolites. Based on the number of predicted secondary metabolite biosynthesis genes and the fact that the expression of many secondary metabolite gene clusters is cryptic , meaning that expression is not evident under standard experimental conditions, there appears to be the potential for production of many more secondary metabolites than currently known . Secondary metabolite biosynthetic genes often occur in clusters that tend to be sub-telomerically located and are coordinately regulated under certain laboratory conditions [18–20]. Typically, a secondary metabolite biosynthetic gene cluster contains a gene encoding one of several key “backbone” enzymes of the secondary metabolite biosynthetic process: a polyketide synthase (PKS), a non-ribosomal peptide synthetase (NRPS), a polyketide synthase/non-ribosomal peptide synthetase hybrid (PKS-NRPS), a prenyltransferase known as dimethylallyl tryptophan synthase (DMATS) and/or a diterpene synthase (DTS).
Comparative sequence analysis based on known backbone enzymes has been used to identify potential secondary metabolite biosynthetic gene clusters for subsequent experimental verification. One approach for experimental verification is the deletion of genes with suspected roles in secondary metabolite biosynthesis followed by identification of the specific secondary metabolite profiles of the mutants by thin layer chromatography, NMR or other methods [7, 8]. For example, the deletion of A. fumigatus encA, which encodes an ortholog of the A. nidulans non-reducing PKS (NR-PKS) mdpG, followed by analysis of culture extracts using high-performance liquid chromatography (HPLC) enabled the recent identification of endocrocin and its biosynthetic pathway intermediates . Similarly, the deletion of the gene encoding the PKS, easB, enabled the identification of the emericellamide biosynthetic pathway of A. nidulans. Another approach is the overexpression of predicted transcriptional regulators of secondary metabolism gene clusters with subsequent analysis of the gene expression and secondary metabolite profiles of the resulting strains, which has facilitated the identification of numerous secondary metabolites and the genes responsible for their synthesis [23, 24]. For example, overexpression of laeA in A. nidulans, a global transcriptional regulator of secondary metabolism production, coupled with microarray analysis, facilitated the delineation of the cluster responsible for production of the anti-tumor compound, terrequinone A . Thus, genome sequence analysis, coupled with targeted experimentation, has been a highly effective strategy for identifying novel secondary metabolites and the genes involved in their synthesis.
The Aspergillus Genome Database (AspGD; http://www.aspgd.org) is a web-based resource that provides centralized access to gene and protein sequences, analysis tools and manually curated information derived from the published scientific literature for A. nidulans, A. fumigatus, A. niger and A. oryzae[25, 26]. AspGD curators read the published experimental literature to record information including gene names and synonyms, write free-text descriptions of each gene, record phenotypes and assign terms that describe functional information about genes and proteins using the Gene Ontology (GO; http://www.geneontology.org). These annotations are an important resource for the scientific research community, used both for reference on individual genes of interest as well as for analysis of results from microarray, proteomic experiments, or other screens that produce large lists of genes.
The GO is a structured vocabulary for describing the functions associated with genes products . GO terms describe the activity of a gene product (Molecular Function; MF) within the cell, the biological process (Biological Process; BP) in which a gene product is involved and the location within the cell (Cellular Component; CC) where the gene product is observed . Evidence codes are assigned to GO annotations based on the type of available experimental evidence.
At the start of this project most of the terms needed to describe secondary metabolite biosynthetic genes or regulators of secondary metabolism did not yet exist in the GO. Thus, in order to provide an improved annotation of secondary metabolite biosynthetic genes and their regulatory proteins, we developed new GO terms for secondary metabolite production in collaboration with the GO Consortium, and reannotated the entire set of genes associated with secondary metabolism in AspGD. We then performed a comprehensive analysis of the secondary metabolism biosynthetic genes and their orthologs across the genomes of A. nidulans, A. fumigatus, A. niger and A. oryzae and now provide a set of manually annotated secondary metabolite gene clusters. We anticipate that these new, more precise annotations will encourage the rapid and efficient experimental verification of novel secondary metabolite biosynthetic gene clusters in Aspergillus and the identification of the corresponding secondary metabolites.
Identifying genes for reannotation
Many branches of the GO, such as apoptosis and cardiac development , have recently been expanded and revised to include new terms that are highly specific to these processes. The secondary metabolism literature has expanded over the last several years, allowing AspGD curators to make annotations to an increasing number of genes with roles in secondary metabolism. During routine curation, it became apparent that hundreds of Aspergillus genes that were candidates for annotation to the GO term ‘secondary metabolic process’ had the potential for more granular annotations, since, in many cases, the specific secondary metabolite produced by a gene product is known. At the inception of this project, only terms for ‘aflatoxin biosynthetic process, ’ ‘penicillin biosynthetic process’ and ‘sterigmatocystin biosynthetic process, ’ the 3 most well-studied secondary metabolites to date, were present in the GO (Additional file 1).
Candidate genes for reannotation were identified as those that had pre-existing GO annotations to ‘secondary metabolic process’ or curated mutant phenotypes that impact secondary metabolite production. For example, numerous genes in AspGD are annotated with mutant phenotypes affecting the production of secondary metabolites such as asperthecin , austinol and dehydroaustinol , emericellin , fumiquinazolines , orsellinic acid , pseurotin A , shamixanthones [32, 36] and violaceol  among others. These genes were then analyzed and a list of new GO terms was generated to annotate these genes more specifically (Table 1 and Additional file 1).
We also used published SMURF (Secondary Metabolite Unknown Regions Finder) predictions  to annotate additional candidate gene cluster backbone enzymes (i.e., PKS, NRPS, DMATS). SMURF is highly accurate at predicting most of these cluster backbone enzymes; across the four species of Aspergillus analyzed it identified a total of 105 genes as encoding PKS or PKS-like enzymes, 65 genes encoding NRPS or NRPS-like enzymes, 8 genes encoding putative hybrid PKS-NRPS enzymes and 15 DMATS. Note that DTS genes are not predicted by the SMURF algorithm. The AspGD Locus Summary pages now indicate these annotations based on the cluster backbone predictions generated by SMURF and by direct experimental characterization from the secondary metabolism literature.
Expansion of the secondary metabolism branch of the GO
To improve the accuracy of the AspGD GO annotation in the area of secondary metabolite production, a branch of the GO in which terms were sparse, we worked in collaboration with the GO Consortium to add new, more specific terms to the BP aspect of the ontology, and then used many of these new GO terms to annotate the Aspergillus genes that had experimentally determined mutant phenotype data associated with one or more secondary metabolite. We focused on the BP annotations because the relevant processes are well-represented in the experimental literature, whereas experimental data to support CC annotations are relatively sparse in the secondary metabolism literature. Adequate MF terms exist for the PKS and NRPS enzymes, but annotations to them in AspGD are mostly based on computationally determined domain matches and Interpro2GO annotations, or by annotations with Reviewed Computational Analysis (RCA) as the evidence code, meaning that these functions are predicted, rather than directly characterized through experiments.
The new GO annotations that we have added now precisely specify the secondary metabolite produced. For example, mdpG is known to influence the production of arugosin, emodin, monodictyphenone, orsinellic acid, shamixanthones and sterigmatocystin in A. nidulans. The gene was formerly annotated to the fairly nonspecific parental term ‘secondary metabolic process’ (GO:0019748), but because the secondary metabolites produced by this protein are known and published, it is now annotated to the new and more informative child terms ‘arugosin biosynthetic process’ (GO:1900587), ‘emodin biosynthetic process’ (GO:1900575), ‘monodictyphenone biosynthetic process’ (GO:1900815), ‘o-orsellinic acid biosynthetic process’ (GO:1900584), ‘shamixanthone biosynthetic process’ (GO:1900793) and ‘sterigmatocystin biosynthetic process’ (GO:0045461).
In total, we added 290 new BP terms to the GO for 48 secondary metabolites produced by one or more Aspergillus species. There are over 400 Aspergillus genes in AspGD that have been manually or computationally annotated to more specific secondary metabolism BP terms, based on over 260 publications (Table 2). A complete list of the GO terms for secondary metabolic process annotations is available in Additional file 1. The addition of new terms is ongoing as new secondary metabolites and their biosynthetic genes are identified and described in the scientific literature. The process of adding new GO terms depends on the elucidation of the structure of the secondary metabolite as the structure is required for new ChEBI (Chemical Entities of Biological Interest; http://www.ebi.ac.uk/chebi/) terms to be assigned, and these chemical compound terms are a prerequisite for GO term assignments involving chemical compounds. These new and improved GO terms provide researchers with valuable clues to aid in the identification of proteins involved in the production of specific classes of Aspergillus secondary metabolites.
Predictive annotation using orthology relationships in conjunction with experimentally-based GO term assignments
Manual curation of the genes of one species can be used to computationally annotate the uncharacterized genes in another species based on orthology relationships. The use of GO to describe gene products facilitates comparative analysis of functions of orthologous genes throughout the tree of life, including orthologous genes within the filamentous fungi. To augment the manual GO curation in AspGD, we leveraged orthology relationships to assign GO annotations to genes that lacked manual annotations of their own but which had an experimentally characterized ortholog in AspGD, the Saccharomyces Genome Database (SGD) (http://www.yeastgenome.org) or PomBase (http://www.pombase.org). A total of 492 GO annotations were made to secondary metabolism-related genes in A. nidulans, A. fumigatus, A. niger and A. oryzae based on their orthology relationships (Table 3). Files listing these orthology relationships are available for download at http://www.aspergillusgenome.org/download/homology/orthologs/ and the files describing all GO term annotations for each gene product in AspGD are available at http://www.aspergillusgenome.org/download/go/. A list of all genes annotated to the secondary metabolic process branch of the GO and their associated annotations can be obtained through the AspGD Advanced Search Tool (http://www.aspergillusgenome.org/cgi-bin/search/featureSearch).
Manual annotation of computationally predicted gene clusters
Algorithms such as SMURF  and antiSMASH (antibiotics and Secondary Metabolite Analysis SHell)  can be used to predict fungal secondary metabolite gene clusters. Both of these algorithms are based on the identification of backbone enzymes, usually one or more polyketide synthase (PKS), non-ribosomal peptide synthetase (NRPS), hybrid PKS-NRPS, NRPS-like enzyme or dimethylallyl tryptophan synthase (DMATS), and the use of a training set of experimentally characterized clusters. Adjacent genes are then scanned for the presence of common secondary metabolite gene domains and boundaries are predicted for each cluster. We used the pre-computed gene clusters for A. nidulans, A. fumigatus, A. niger and A. oryzae that were identified at the J. Craig Venter Institute (JCVI) with the SMURF algorithm . We also used the antiSMASH algorithm  on these genomes to make gene cluster predictions and added 5 additional clusters for A. nidulans based on the presence of DTS/ent-kaurene synthase backbone enzymes.
Altogether, a total of 261 non-redundant clusters were predicted by SMURF and antiSMASH: 71 for A. nidulans, 39 for A. fumigatus, 81 for A. niger and 75 for A. oryzae (Tables 4, 5, 6, 7). Neither SMURF nor antiSMASH predict DTS-based clusters, so these clusters were manually identified based on their annotations. Because clusters with other types of non-PKS and non-NRPS backbone enzymes were included in the antiSMASH predictions and SMURF only analyzes PKS, NRPKS or DMATS-based clusters, antiSMASH identified more clusters than SMURF in every species except for A. niger (Table 8). For clusters identified by both algorithms, there were no cases where both the left and right boundary predictions were the same, although a small number of single boundary predictions did coincide with each other (Tables 4, 5, 6, 7). Both the experimentally and manually (see below) predicted clusters tend to be smaller than the SMURF and antiSMASH algorithms predict, as the algorithms are designed to err on the side of inclusivity while the manual boundaries are designed to provide increased precision of the cluster boundaries through the examination of inter- and intra-cluster genome synteny alignments across multiple Aspergillus species. SMURF was previously reported to overpredict boundaries by about 4 genes  and we found that antiSMASH performed similarly. Figure 1 shows an example of the disparity between these two prediction programs in cluster boundary determination and how intra- and inter-species cluster synteny data used in our analysis aids in the manual predictions of secondary metabolite gene cluster boundaries (see below).
Andersen et al. recently reported another strategy of identifying the extent of secondary metabolite gene cluster boundaries. Their method uses genome-wide microarray expression studies from A. nidulans to identify coregulated genes surrounding secondary metabolite gene cluster backbone enzymes. Since secondary metabolite gene clusters often show cryptic expression under many laboratory growth conditions, this study generated expression data from cultures grown on a wide variety of media (to maximize the possibility of expression), and combined these data with previously generated expression data to analyze a superset of 44 expression conditions . Their analysis produced a list of 53 predicted secondary metabolite gene clusters of A. nidulans, some of which show clear patterns of coregulated expression while some of the expressed backbone enzymes showed no correlation with adjacent genes. Five of these were DTS-based gene clusters not identified by the SMURF or antiSMASH algorithms. These data have been curated at AspGD and were used as a criterion for our manual cluster boundary predictions (see below). An example of the inpA- and inpB-containing gene cluster determined by this criterion is shown in Figure 2. The gene clusters of A. nidulans with all of the boundary predictions made with ‘expression pattern’ as the primary evidence are listed in Table 4. The total number of boundaries predicted using this criterion is summarized in Table 9.
To generate a high-quality set of candidate secondary metabolite biosynthetic gene clusters, we used SMURF and antiSMASH as the source of cluster predictions, along with manually predicted DTS clusters and then manually refined the gene cluster boundaries. Manual cluster boundary annotations (Tables 4, 5, 6, 7 and Additional files 2, 3, 4, 5) were made based on several criteria: published experimental data (including gene expression studies), synteny between clustered genes among different species indicated by the presence of conserved gene cluster boundaries (Figure 1), functional annotation of predicted genes within and adjacent to clusters and increases in intergenic distance between boundary genes and adjacent genes, which we frequently observed (Figure 3). We determined that gene clusters tend to be conserved between species and that breaks in cluster synteny frequently indicate a cluster boundary. To the best of our knowledge, no gene cluster prediction algorithm or research group has used genomic comparisons between species for large-scale cluster predictions. We used the Sybil viewer , which displays alignments of orthologous genes across multiple species in their genomic context, to manually examine potential boundaries and to compare synteny between clusters of different species and/or strains (Figure 1) and the adjacent syntenic regions outside each predicted cluster. The genome sequence is available for two strains each of A. fumigatus (Af293 and A1163) and A. niger (CBS513.88 and ATCC 1015), which allowed us to consider cluster synteny, which approached 100%, between these strains in addition to the orthology between Aspergillus species.
AspGD displays and provides sequence resources for 15 Aspergillus genomes and related species. A given genome is typically particularly closely related to that of one or two of the other species; the A. fumigatus genome best matches that of Neosartorya fischeri (see Sybil syntenic genomic context in Additional file 3), A. niger best matches A. acidus and A. brasiliensis (Additional file 4) and A. oryzae best matches A. flavus (Additional file 5). Unlike A. fumigatus, A. niger and A. oryzae, A. nidulans lacks such a closely related species in AspGD with sufficient synteny to enable routine use of cluster orthology in boundary determination. Therefore, we used other criteria such as published gene expression patterns , increases in intergenic distance and changes from secondary metabolism-related gene annotations to non-secondary metabolism-related gene annotations (described below) for making these predictions in A. nidulans (Figure 1). The numbers of manually predicted gene clusters in each of these additional species, determined by observing breaks in gene cluster synteny (see Methods), are summarized in Table 9.
In some cases, the functional annotation of the putative gene cluster members was informative in predicting cluster boundaries, especially for A. nidulans, which often lacked cluster synteny with other species present in AspGD. In addition to genes encoding the core backbone enzymes, clusters typically include one or more acyl transferase, oxidoreductase, hydrolase, cytochrome P450, transmembrane transporter and a transcription factor. We manually inspected each cluster and the genomic region surrounding it; changes in functional annotations from typical secondary metabolism annotations to annotations atypical of secondary metabolic processes were frequently observed upon traversing a cluster boundary (Additional files 2, 3, 4, 5) and this was used as an additional criterion for boundary prediction, especially in cases where inter- or intra-species clustering or published gene expression data were not available. In some instances, genes with functional annotations unrelated to secondary metabolism are embedded within a cluster. For example, A. nidulans bglD (AN7915) encodes a glucosidase present in the F9775 biosynthetic gene cluster (Additional file 2). In a cclAΔ strain background in which histone 3 lysine 4 methylation is impaired, the expression of cryptic secondary metabolite clusters, such as F9775, is activated . The activation of bglD expression was observed along with other genes in the F9775 cluster and based on this pattern of coregulation, bglD is included as a member of this cluster . It is unclear, however, whether bglD actually plays a role in F9775 biosynthesis. The gene encoding translation elongation factor 1 gamma, stcT, is a member of the ST gene cluster (stc) of A. nidulans. Its inclusion in the stc cluster was based on its pattern of coregulation with 24 other genes, some of which have experimentally determined roles in A. nidulans ST biosynthesis, or are orthologous to A. parasiticus proteins involved in AF production, for which ST is a precursor . We also observed a gene, AN2546, that is expressed, and is predicted to encode a glycosylphosphatidylinositol (GPI)-anchored protein , located in the emericellamide cluster (Additional file 2); however, an AN2546 deletion strain still produces emericellamide, thus its inclusion in the cluster is based on its genomic location and expression pattern rather than function. These examples indicate that some genes are located within clusters and yet may not contribute to secondary metabolite production. The frequency and significance of unrelated genes that have become incorporated into a secondary metabolism gene cluster remains unclear; experimental verification is needed to further assess these. In cases where the cluster synteny data were compelling, cluster synteny was given higher precedence than functional annotation in the delineation of the cluster boundaries.
Increases in the distance between predicted boundary genes and the gene directly adjacent to a boundary (which we refer to as intergenic distance) were frequently observed. An example with a large intergenic distance at the right boundary is shown in the A. fumigatus gliotoxin (gli) cluster (Figure 3). However, we found that more subtle increases in intergenic distance were only somewhat reliable when compared to boundaries with experimental evidence. We therefore only based a cluster boundary prediction on an increase in intergenic distance in a small number of cases where no other data were available (Table 9).
AspGD provides high-quality manual and computational gene structure and function annotations for A. nidulans, A. fumigatus, A. niger and A. oryzae, along with sequence analysis and visualization resources for these and additional Aspergilli and related species. Among fungal databases, AspGD is the only resource performing comprehensive manual literature curation for Aspergillus species. AspGD contains curated data covering the entire corpus of experimental literature for A. nidulans, A. fumigatus, A. niger and A. oryzae, with phenotype and GO annotations for every gene described in the literature for these species, including those related to secondary metabolism. The direct, manual curation of genes from the literature forms the basis for the computational annotations at AspGD. This information, collected in a centralized, freely accessible resource, provides an indispensible resource for scientific information for researchers.
During the course of curation, we identified gaps in the set of GO terms that were available in the Biological Process branch of the ontology. To improve the GO annotations for secondary metabolite biosynthetic genes, we added new, more specific BP terms to the GO and used these new terms for direct annotation of Aspergillus genes. These terms include the specific secondary metabolite in each GO term name. Because ‘secondary metabolic process’ (GO:0019748) and ‘regulation of secondary metabolite biosynthetic process’ (GO:0043455) map to different branches in the GO hierarchy, complete annotation of transcriptional regulators of secondary metabolite biosynthetic gene clusters, such as laeA, requires an additional annotation to the regulatory term that we also added for each secondary metabolite.
GO annotations facilitate predictions of gene function across multiple species and, as part of this project, we used orthology relationships between experimentally characterized A. nidulans, A. fumigatus, A. niger and A. oryzae genes to provide orthology-based GO predictions for the unannotated secondary metabolism-related genes in AspGD. The prediction and complete cataloging of these candidate secondary metabolism-related genes will facilitate future experimental studies and, ultimately, the identification of all secondary metabolites and the corresponding secondary metabolism genes in Aspergillus and other species.
The SMURF and antiSMASH algorithms are efficient at predicting gene clusters on the basis of the presence of certain canonical backbone enzymes; however, disparities between boundaries predicted by these methods became obvious when the clusters predicted by each method were aligned. While there was an extensive overlap between the two sets of identified clusters, in most cases the cluster boundaries predicted by SMURF and antiSMASH were different, requiring manual refinement.
The data analysis of Andersen et al. used a clustering matrix to identify superclusters, defined as clusters with similar expression, independent of chromosomal location, that are predicted to participate in cross-chemistry between clusters to synthesize a single secondary metabolite. They identified seven superclusters of A. nidulans. Two known meroterpenoid clusters that exhibit cross-chemistry, and are located on separate chromosomes, are the austinol (aus) clusters involved in the synthesis of austinol and dehydroaustinol [31, 37]. The biosynthesis of prenyl xanthones in A. nidulans is dependent on three separate gene clusters . This was apparent because the mdpG gene cluster was shown to be required for the synthesis of the anthraquinone emodin, monodictyphenone, and related compounds. Emodin and monodictyphenone are precursors of prenyl xanthones and the mdpG cluster lacked a prenyltransferase, required for prenyl xanthone synthesis . A search of the A. nidulans genome for prenyltransferases that may participate in prenyl xanthone synthesis predicts seven prenyltransferases. Two strains (ΔxptA and ΔxptB) with mutated prenyltransferase genes at chromosomal locations distant from the mdpG cluster, have been described as being defective in prenyl xanthone synthesis. Therefore, while a total of 266 unique clusters were identified in our analysis, published data indicate that some of these clusters may function as superclusters that display cross-chemistry synthesis of a single secondary metabolite or group of related secondary metabolites [16, 31, 36].
Our manual annotation of secondary metabolite gene clusters in four Aspergillus species complements the computational prediction methods for identifying fungal secondary metabolites and the genes responsible for their biosynthesis. Implicit in our interspecies cluster synteny analysis is the prediction of secondary metabolite gene clusters orthologous to those in our curated species. For example, A. nidulans gene clusters most closely matched those in A. versicolor, thus identifying several new predicted A. versicolor gene clusters by orthology and interspecies cluster synteny with the predicted A. nidulans clusters (Additional file 2).
These new curated data, based on both computational analysis and manual evaluation of the Aspergillus genomes, provide researchers with a comprehensive set of annotated secondary metabolite gene clusters and a comprehensive functional annotation of the secondary metabolite gene products within AspGD. We anticipate that these new data will promote research in this important and complex area of Aspergillus biology.
Generation of new GO terms
The Gene Ontology Consortium requires that any compounds within BP term names in the GO be cataloged in the Chemical Entities of Biological Interest (ChEBI) database (http://www.ebi.ac.uk/chebi/). To enable the creation of the new GO terms, we first requested and were assigned ChEBI identifiers for all secondary metabolites recorded in AspGD. Once ChEBI term identifiers were assigned, the relevant GO terms were requested from the GO Consortium through TermGenie (http://go.termgenie.org/) for biosynthetic process, metabolic process and catabolic process terms for each new secondary metabolic process term and regulation of secondary metabolic process term (Additional file 1).
Orthologous protein predictions
Jaccard-clustering, which groups together highly similar proteins within a genome of interest, was used to make ortholog predictions between the Aspergillus species and is described in detail at http://sybil.sourceforge.net/documentation.html#jaccard. Briefly, the first step of this algorithm identifies highly similar proteins within each genome of interest. The resulting groups (“clusters”) from multiple genomes are themselves grouped in the second step to form orthologous groups (“Jaccard Orthologous Clusters”). The corresponding genes can be subsequently analyzed in their genomic context to visually identify conserved synteny blocks that are displayed in the Sybil genome viewer (aspgd.broadinstitute.org). The ortholog predictions for all AspGD species are available for download at http://www.aspergillusgenome.org/download/homology/orthologs/. Orthologous protein predictions between Saccharomyces cerevisiae, Schizosaccharomyces pombe and the Aspergillus protein sets were made by pair-wise comparisons using the InParanoid software . InParanoid was chosen based on compatibility with the existing ortholog analysis pipeline at AspGD, and comparable accuracy when compared with alternative methods . Stringent cutoffs were used: BLOSUM80 and an InParanoid score of 100% (parameters: -F \“m S\” -M BLOSUM80). The data from this comparison are available for download at (http://www.aspergillusgenome.org/download/homology/).
Orthology- and domain-based GO transfer
To augment the annotations for all genes, including secondary metabolism related genes, we used manual and domain-based GO annotations to annotate the predicted orthologs that lacked direct experimental characterization. Ortholog predictions for A. nidulans, A. fumigatus, A. niger and A. oryzae were made based on the characterized proteins of S. cerevisiae, S. pombe and the other Aspergillus species in AspGD. Candidate GO annotations to be used as the basis for these inferences are limited to those with experimental evidence, that is, with evidence codes of IDA (Inferred from Direct Assay), IPI (Inferred from Physical Interaction), IGI (Inferred from Genetic Interaction) or IMP (Inferred from Mutant Phenotype). Annotations that are themselves predicted in S. cerevisiae, S. pombe or in Aspergillus, either based on sequence similarity or by some other methods, are excluded from this group to avoid transitive propagation of predictions. Also excluded from the predicted annotation set are annotations that are redundant with existing, manually curated annotations or those that assign a related but less specific GO term. The orthology-based GO assignments are given the evidence code IEA (Inferred from Electronic Annotation) and displayed with the source species and name of the gene from which they were derived, along with a hyperlink to the appropriate gene page at AspGD, SGD or PomBase. The new annotations that have been manually assigned or electronically transferred from S. cerevisiae and S. pombe to A. nidulans, A. fumigatus, A. niger and A. oryzae, and between the Aspergillus species are summarized in Table 3.
Domain-based GO transfers were assigned to a lower precedence than orthology-based transfers. IprScan predicts InterPro domains based on protein sequences . The Interpro2go mapping file (http://www.ebi.ac.uk/interpro) was used to map GO annotations to genes with the corresponding domain predictions. A domain-based GO prediction was made only if it was not redundant with an existing manually-curated or orthology-based GO term, or one of its parental terms, that was already assigned to an orthologous protein.
Finally, descriptions for genes lacking manual or GO-based annotations were constructed from the manual GO terms assigned to characterized orthologs. GO annotations were included with the following precedence: BP, followed by MF, and then CC. For genes that lacked experimental characterization and characterized orthologs, but had functionally characterized InterPro domains, descriptions were generated from the domain-based GO annotations. The same precedence rules applied as to the descriptions generated using orthology-based GO information. For genes that lacked experimental characterization and characterized orthologs, and without functionally characterized InterPro domains, but had uncharacterized orthologs, the descriptions simply list the orthology relationship because no inferred GO information was available.
Secondary metabolic gene cluster analysis and annotation
The pre-computed results file (smurf_output_precomputed_08.13.08.zip) was downloaded from the SMURF website (http://jcvi.org/smurf/index.php). Version 1.2.1 of the antiSMASH program  was downloaded from (http://antismash.secondarymetabolites.org/) and run locally on the chromosome and/or contig sequences of A. nidulans FGSC A4, A. fumigatus Af293, A. niger CBS 513.88 and A. oryzae RIB40. Details of the parameters the antiSMASH program uses to predict boundaries are in described in Medema et al. 1998  and those for SMURF are described in Khaldi et al. 2010 . The secondary metabolic gene clusters predicted by these programs were manually analyzed and annotated using functional data available for each gene in AspGD. Cluster membership was determined based on physical proximity of candidate genes to cluster backbone genes. Adjacent genes were added to the cluster if they had functional annotations common to known secondary metabolism genes. In cases where backbone genes had Jaccard orthologs in other species (see above), we required orthology between all other cluster members. Confirmation of orthology between clusters was facilitated by use of the Sybil multiple genome browser which can be used to evaluate synteny between species. We visually evaluated synteny by examining whether a gene that was putatively in a cluster had orthologs in the other species – where a gene in the species in which the cluster was identified no longer had orthologs in the other species that were adjacent, we inferred a break in synteny. Cluster boundaries were also determined by changes in common functional annotation, or by an increase in intergenic distances. tRNAs and other non-coding RNAs were excluded in cluster boundary analysis. Annotated images of the orthologous gene clusters are included in Additional files 2, 3, 4, 5.
Bhetariya PJ, Madan T, Basir SF, Varma A, Usha SP: Allergens/Antigens, toxins and polyketides of important Aspergillus species. Indian J Clin Biochem. 2011, 26: 104-119. 10.1007/s12291-011-0131-5.
Rohlfs M, Albert M, Keller NP, Kempken F: Secondary chemicals protect mould from fungivory. Biol Lett. 2007, 3: 523-525. 10.1098/rsbl.2007.0338.
MacCabe AP, van Liempt H, Palissa H, Unkles SE, Riach MB, Pfeifer E, von Döhren H, Kinghorn JR: Delta-(L-alpha-aminoadipyl)-L-cysteinyl-D-valine synthetase from Aspergillus nidulans. Molecular characterization of the acvA gene encoding the first enzyme of the penicillin biosynthetic pathway. J Biol Chem. 1991, 266: 12646-12654.
MacCabe AP, Riach MB, Unkles SE, Kinghorn JR: The Aspergillus nidulans npeA locus consists of three contiguous genes required for penicillin biosynthesis. EMBO J. 1990, 9: 279-287.
Ramón D, Carramolino L, Patiño C, Sánchez F, Peñalva MA: Cloning and characterization of the isopenicillin N synthetase gene mediating the formation of the beta-lactam ring in Aspergillus nidulans. Gene. 1987, 57: 171-181. 10.1016/0378-1119(87)90120-X.
Yu JH, Leonard TJ: Sterigmatocystin biosynthesis in Aspergillus nidulans requires a novel type I polyketide synthase. J Bacteriol. 1995, 177: 4792-4800.
Keller NP, Segner S, Bhatnagar D, Adams TH: stcS, a putative P-450 monooxygenase, is required for the conversion of versicolorin A to sterigmatocystin in Aspergillus nidulans. Appl Environ Microbiol. 1995, 61: 3628-3632.
Kelkar HS, Keller NP, Adams TH: Aspergillus nidulans stcP encodes an O-methyltransferase that is required for sterigmatocystin biosynthesis. Appl Environ Microbiol. 1996, 62: 4296-4298.
Butchko RA, Adams TH, Keller NP: Aspergillus nidulans mutants defective in stc gene cluster regulation. Genetics. 1999, 153: 715-720.
Kelkar HS, Skloss TW, Haw JF, Keller NP, Adams TH: Aspergillus nidulans stcL encodes a putative cytochrome P-450 monooxygenase required for bisfuran desaturation during aflatoxin/sterigmatocystin biosynthesis. J Biol Chem. 1997, 272: 1589-1594. 10.1074/jbc.272.3.1589.
Luque MI, Rodríguez A, Andrade MJ, Martín A, Córdoba JJ: Development of a PCR protocol to detect aflatoxigenic molds in food products. J Food Prot. 2012, 75: 85-89. 10.4315/0362-028X.JFP-11-268.
Kupfahl C, Michalka A, Lass-Flörl C, Fischer G, Haase G, Ruppert T, Geginat G, Hof H: Gliotoxin production by clinical and environmental Aspergillus fumigatus strains. Int J Med Microbiol. 2008, 298: 319-327. 10.1016/j.ijmm.2007.04.006.
Lewis RE, Wiederhold NP, Lionakis MS, Prince RA, Kontoyiannis DP: Frequency and species distribution of gliotoxin-producing Aspergillus isolates recovered from patients at a tertiary-care cancer center. J Clin Microbiol. 2005, 43: 6120-6122. 10.1128/JCM.43.12.6120-6122.2005.
Morton CO, Bouzani M, Loeffler J, Rogers TR: Direct interaction studies between Aspergillus fumigatus and human immune cells; what have we learned about pathogenicity and host immunity?. Front Microbiol. 2012, 3: 413-
Scharf DH, Heinekamp T, Remme N, Hortschansky P, Brakhage AA, Hertweck C: Biosynthesis and function of gliotoxin in Aspergillus fumigatus. Appl Microbiol Biotechnol. 2012, 93: 467-472. 10.1007/s00253-011-3689-1.
Andersen MR, Nielsen JB, Klitgaard A, Petersen LM, Zachariasen M, Hansen TJ, Blicher LH, Gotfredsen CH, Larsen TO, Nielsen KF, Mortensen UH: Accurate prediction of secondary metabolite gene clusters in filamentous fungi. Proc Natl Acad Sci USA. 2013, 110: E99-E107. 10.1073/pnas.1205532110.
Sanchez JF, Somoza AD, Keller NP, Wang CC: Advances in Aspergillus secondary metabolite research in the post-genomic era. Nat Prod Rep. 2012, 29: 351-371. 10.1039/c2np00084a.
Bouhired S, Weber M, Kempf-Sontag A, Keller NP, Hoffmeister D: Accurate prediction of the Aspergillus nidulans terrequinone gene cluster boundaries using the transcriptional regulator LaeA. Fungal Genet Biol. 2007, 44: 1134-1145. 10.1016/j.fgb.2006.12.010.
Perrin RM, Federova ND, Bok JW, Cramer RA, Wortman JR, Kim HS, Nierman WC, Keller NP: Transcriptional regulation of chemical diversity in Aspergillus fumigatus by LaeA. PLoS Pathog. 2007, 3: 523-525.
Palmer JM, Keller NP: Secondary metabolism in fungi: does chromosomal location matter?. Curr Opin Microbiol. 2010, 13: 431-436. 10.1016/j.mib.2010.04.008.
Lim FY, Hou Y, Chen Y, Oh JH, Lee I, Bugni TS, Keller NP: Genome-based cluster deletion reveals an endocrocin biosynthetic pathway in Aspergillus fumigatus. Appl Environ Microbiol. 2012, 78: 4117-4125. 10.1128/AEM.07710-11.
Chiang YM, Szewczyk E, Nayak T, Davidson AD, Sanchez JF, Lo HC, Ho WY, Simityan H, Kuo E, Praseuth A, Watanabe K, Oakley BR, Wang CC: Molecular genetic mining of the Aspergillus secondary metabolome: discovery of the emericellamide biosynthetic pathway. Chem Biol. 2008, 15: 527-532. 10.1016/j.chembiol.2008.05.010.
Ahuja M, Chiang YM, Chang SL, Praseuth MB, Entwistle R, Sanchez JF, Lo HC, Yeh HH, Oakley BR, Wang CC: Illuminating the diversity of aromatic polyketide synthases in Aspergillus nidulans. J Am Chem Soc. 2012, 134: 8212-8221. 10.1021/ja3016395.
Nakazawa T, Ishiuchi K, Praseuth A, Noguchi H, Hotta K, Watanabe K: Overexpressing transcriptional regulator in Aspergillus oryzae activates a silent biosynthetic pathway to produce a novel polyketide. ChemBioChem. 2012, 13: 855-861. 10.1002/cbic.201200107.
Arnaud MB, Chibucos MC, Costanzo MC, Crabtree J, Inglis DO, Lotia A, Orvis J, Shah P, Skrzypek MS, Binkley G, Miyasato SR, Wortman JR, Sherlock G: The Aspergillus Genome Database, a curated comparative genomics resource for gene, protein and sequence information for the Aspergillus research community. Nucleic Acids Res. 2010, 38: D420-427. 10.1093/nar/gkp751.
Arnaud MB, Cerqueira GC, Inglis DO, Skrzypek MS, Binkley J, Chibucos MC, Crabtree J, Howarth C, Orvis J, Shah P, Wymore F, Binkley G, Miyasato SR, Simison M, Sherlock G, Wortman JR: The Aspergillus Genome Database (AspGD): recent developments in comprehensive multispecies curation, comparative genomics and community resources. Nucleic Acids Res. 2012, 40: D653-659. 10.1093/nar/gkr875.
The Gene Ontology Consortium: Gene Ontology Annotations and Resources. Nucleic Acids Res. 2012, 41: D530-535.
Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, Richter J, Rubin GM, Blake JA, Bult C, Dolan M, Drabkin H, Eppig JT, Hill DP, Ni L, Ringwald M, Balakrishnan R, Cherry JM, Christie KR, Costanzo MC, Dwight SS, Engel S, Fisk DG, Hirschman JE, Hong EL, Nash RS, Sethuraman A, Theesfeld CL, Botstein D, Dolinski K, Feierbach B, Berardini T, Mundodi S, Rhee SY, Apweiler R, Barrell D, Camon E, Dimmer E, Lee V, Chisholm R, Gaudet P, Kibbe W, Kishore R, Schwarz EM, Sternberg P, Gwinn M, Hannick L, Wortman J, Berriman M, Wood V, de la Cruz N, Tonellato P, Jaiswal P, Seigfried T, White R, Gene Ontology Consortium: The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. 2004, 32: D258-261. 10.1093/nar/gkh036.
Khodiyar VK, Hill DP, Howe D, Berardini TZ, Tweedie S, Talmud PJ, Breckenridge R, Bhattarcharya S, Riley P, Scambler P, Lovering RC: The representation of heart development in the gene ontology. Dev Biol. 2011, 354: 9-17. 10.1016/j.ydbio.2011.03.011.
Szewczyk E, Chiang YM, Oakley CE, Davidson AD, Wang CC, Oakley BR: Identification and characterization of the asperthecin gene cluster of Aspergillus nidulans. Appl Environ Microbiol. 2008, 74: 7607-7612. 10.1128/AEM.01743-08.
Lo HC, Entwistle R, Guo CJ, Ahuja M, Szewczyk E, Hung JH, Chiang YM, Oakley BR, Wang CC: Two separate gene clusters encode the biosynthetic pathway for the meroterpenoids austinol and dehydroaustinol in Aspergillus nidulans. J Am Chem Soc. 2012, 134: 4709-4720. 10.1021/ja209809t.
Márquez-Fernández O, Trigos A, Ramos-Balderas JL, Viniegra-González G, Deising HB, Aguirre J: Phosphopantetheinyl transferase CfwA/NpgA is required for Aspergillus nidulans secondary metabolism and asexual development. Eukaryot Cell. 2007, 6: 710-720. 10.1128/EC.00362-06.
Ames BD, Haynes SW, Gao X, Evans BS, Kelleher NL, Tang Y, Walsh CT: Complexity generation in fungal peptidyl alkaloid biosynthesis: oxidation of fumiquinazoline A to the heptacyclic hemiaminal fumiquinazoline C by the flavoenzyme Af12070 from Aspergillus fumigatus. Biochemistry. 2011, 50: 8756-8769. 10.1021/bi201302w.
Sanchez JF, Chiang YM, Szewczyk E, Davidson AD, Ahuja M, Elizabeth Oakley C, Woo Bok J, Keller N, Oakley BR, Wang CC: Molecular genetic analysis of the orsellinic acid/F9775 gene cluster of Aspergillus nidulans. Mol Biosyst. 2010, 6: 587-593. 10.1039/b904541d.
Maiya S, Grundmann A, Li X, Li SM, Turner G: Identification of a hybrid PKS/NRPS required for pseurotin A biosynthesis in the human pathogen Aspergillus fumigatus. ChemBioChem. 2007, 8: 1736-1743. 10.1002/cbic.200700202.
Sanchez JF, Entwistle R, Hung JH, Yaegashi J, Jain S, Chiang YM, Wang CC, Oakley BR: Genome-based deletion analysis reveals the prenyl xanthone biosynthesis pathway in Aspergillus nidulans. J Am Chem Soc. 2011, 133: 4010-4017. 10.1021/ja1096682.
Nielsen ML, Nielsen JB, Rank C, Klejnstrup ML, Holm DK, Brogaard KH, Hansen BG, Frisvad JC, Larsen TO, Mortensen UH: A genome-wide polyketide synthase deletion library uncovers novel genetic links to polyketides and meroterpenoids in Aspergillus nidulans. FEMS Microbiol Lett. 2011, 321: 157-166. 10.1111/j.1574-6968.2011.02327.x.
Khaldi N, Seifuddin FT, Turner G, Haft D, Nierman WC, Wolfe KH, Fedorova ND: SMURF: Genomic mapping of fungal secondary metabolite clusters. Fungal Genet Biol. 2010, 47: 736-741. 10.1016/j.fgb.2010.06.003.
Medema MH, Blin K, Cimermancic P, de Jager V, Zakrzewski P, Fischbach MA, Weber T, Takano E, Breitling R: antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 2011, 39: W339-346. 10.1093/nar/gkr466.
Chiang YM, Szewczyk E, Davidson AD, Keller N, Oakley BR, Wang CC: A gene cluster containing two fungal polyketide synthases encodes the biosynthetic pathway for a polyketide, asperfuranone, in Aspergillus nidulans. J Am Chem Soc. 2009, 13: 2965-2970.
Bergmann S, Schümann J, Scherlach K, Lange C, Brakhage AA, Hertweck C: Genomics-driven discovery of PKS-NRPS hybrid metabolites from Aspergillus nidulans. Nat Chem Biol. 2007, 3: 213-217. 10.1038/nchembio869.
Gerke J, Bayram O, Feussner K, Landesfeind M, Shelest E, Feussner I, Braus GH: Breaking the silence: protein stabilization uncovers silenced biosynthetic gene clusters in the fungus Aspergillus nidulans. Appl Environ Microbiol. 2012, 78: 8234-8244. 10.1128/AEM.01808-12.
Bergmann S, Funk AN, Scherlach K, Schroeckh V, Shelest E, Horn U, Hertweck C, Brakhage AA: Activation of a silent fungal polyketide biosynthesis pathway through regulatory cross talk with a cryptic nonribosomal peptide synthetase gene cluster. Appl Environ Microbiol. 2010, 76: 8143-8149. 10.1128/AEM.00683-10.
Chiang YM, Szewczyk E, Davidson AD, Entwistle R, Keller NP, Wang CC, Oakley BR: Characterization of the Aspergillus nidulans monodictyphenone gene cluster. Appl Environ Microbiol. 2010, 76: 2067-2074. 10.1128/AEM.02187-09.
Martin J: Clusters of genes for the biosynthesis of antibiotics: regulatory genes and overproduction of pharmaceuticals. J Ind Microbiol. 1992, 9: 73-90. 10.1007/BF01569737.
Brown DW, Yu JH, Kelkar HS, Fernandes M, Nesbitt TC, Keller NP, Adams TH, Leonard TJ: Twenty-five coregulated transcripts define a sterigmatocystin gene cluster in Aspergillus nidulans. Proc Natl Acad Sci USA. 1996, 93: 1418-1422. 10.1073/pnas.93.4.1418.
Bok JW, Hoffmeister D, Maggio-Hall LA, Murillo R, Glasner JD, Keller NP: Genomic mining for Aspergillus natural products. Chem Biol. 2006, 13: 31-37. 10.1016/j.chembiol.2005.10.008.
Robinson SL, Panaccione DG: Chemotypic and genotypic diversity in the ergot alkaloid pathway of Aspergillus fumigatus. Mycologia. 2012, 104: 804-812. 10.3852/11-310.
Maiya S, Grundmann A, Li SM, Turner G: The fumitremorgin gene cluster of Aspergillus fumigatus: identification of a gene encoding brevianamide F synthetase. ChemBioChem. 2006, 7: 1062-1069. 10.1002/cbic.200600003.
Gardiner DM, Howlett BJ: Bioinformatic and expression analysis of the putative gliotoxin biosynthetic gene cluster of Aspergillus fumigatus. FEMS Microbiol Lett. 2005, 248: 241-248. 10.1016/j.femsle.2005.05.046.
Crabtree J, Angiuoli SV, Wortman JR, White OR: Sybil: methods and software for multiple genome comparison and visualization. Meth Mol Biol. 2007, 408: 93-108. 10.1007/978-1-59745-547-3_6.
Bok JW, Chiang YM, Szewczyk E, Reyes-Dominguez Y, Davidson AD, Sanchez JF, Lo HC, Watanabe K, Strauss J, Oakley BR, Wang CC, Keller NP: Chromatin-level regulation of biosynthetic gene clusters. Nat Chem Biol. 2009, 5: 462-464. 10.1038/nchembio.177.
de Groot PW, Brandt BW, Horiuchi H, Ram AF, de Koster CG, Klis FM: Comprehensive genomic analysis of cell wall genes in Aspergillus nidulans. Fungal Genet Biol. 2009, 46: S72-81. 10.1016/j.fgb.2008.07.022.
Remm M, Storm CE, Sonnhammer EL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001, 314: 1041-1052. 10.1006/jmbi.2000.5197.
Altenhoff AM, Dessimoz C: Phylogenetic and Functional Assessment of Orthologs Inference Projects and Methods. PLoS Comput Biol. 2009, 5: e1000262-10.1371/journal.pcbi.1000262.
Zdobnov EM, Apweiler R: InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001, 17: 847-848. 10.1093/bioinformatics/17.9.847.
The authors would like to thank Gail Binkley for the AspGD Oracle Database administration, Stuart Miyasato and Matt Simison for the AspGD database software and hardware maintenance and the editors at CheBI and the GO Consortium. We would also like to thank Vinita Joardar at JCVI for providing an updated set of A. oryzae secondary metabolite gene cluster predictions. This work was supported by the National Institute of Allergy and Infectious Diseases at the US National Institutes of Health [R01 AI077599 to GS and JW].
The authors declare that they have no competing interests.
DOI, MBA and MSS designed the project, DOI wrote the manuscript, GS, JRW, MBA and MSS edited the manuscript, DOI and MSS analyzed the data, DOI and MSS annotated the data, JB, GC, PS and FW performed bioinformatics analysis of the data. All authors read and approved the final manuscript.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.