- Research
- Open Access
- Published:
Isolation, complete genome sequencing and in silico genome mining of Burkholderia for secondary metabolites
BMC Microbiology volume 22, Article number: 323 (2022)
Abstract
Recent years, Burkholderia species have emerged as a new source of natural products (NPs) with increasing attractions. Genome mining suggests the Burkholderia genomes include many natural product biosynthetic gene clusters (BGCs) which are new targets for drug discovery. In order to collect more Burkholderia, here, a strain S-53 was isolated from the soil samples on a mountain area in Changde, P.R. China and verified by comparative genetic analysis to belong to Burkholderia. The complete genome of Burkholderia strain S-53 is 8.2 Mbps in size with an average G + C content of 66.35%. Its taxonomy was both characterized by 16S rRNA- and whole genome-based phylogenetic trees. Bioinformatic prediction in silico revealed it has a total of 15 NP BGCs, some of which may encode unknown products. It is expectable that availability of these BGCs will speed up the identification of new secondary metabolites from Burkholderia and help us understand how sophisticated BGC regulation works.
Introduction
The prevalence of drug-resistant pathogens has been a serious problem and effected the human life and agriculture. The World Health Organization (WHO) estimates ten million deaths by 2050 if multi-drug resistant (MDR) infections are not appropriately managed [1, 2]. All major antibiotic classes have been found to have antimicrobial resistance, and the number of candidates for novel antibiotics is dwindling. Hence screening novel antibacterial compounds is critical for new drug discovery [3].
Microbial natural products are the important sources of drug discovery because of their structural diversity to make up more than 75% of antibiotics [4, 5]. The 99.99% rediscovery rate in traditional discovery pipelines of natural products is a big drawback [6]. However, the last decade has been a revival time for natural product discovery which was fueled by advances in analytical chemistry, bioinformatics, and whole genome sequencing [7].
Microbial genome sequencing revealed that they contain huge sources of cryptic BGCs, which have a larger capability to produce secondary metabolites. Availability of whole genome sequences and synthetic biology-inspired tools/approaches make it possible to utilize these BGCs to develop new chemicals with new structures, new activity and new targets [8].
Modern natural product discovery relies on, to a higher extent, on the microbial genome sequencing and computer mining for BGCs. Next stages include selecting unique BGCs, cloning and expressing selected BGCs in an optimal heterologous host or activating in situ silent BGCs. This pipeline (genome mining of NPs) takes less time on dereplication and streamlines NP discovery via the use of advanced computational, microbiological and synthetic biological approaches, to more extents, compared to traditional screening methods.
Most members of Burkholderia are well-known as pathogens to their hosts (plants or human) and now 44 members in this genus have been identified [9]. In recent years, many species of Burkholderia have been found to have the ability to excrete a range of secondary metabolites, including antibacterial, anticancer, herbicidal, and insecticidal chemicals that can act as bioremediation, biocontrol and plant growth promotion agents [10, 11].
More recently, the increasing data of Burkholderia genome sequences have shown a vast reservoir of NPs, such as non-ribosomal peptides (NRPs) and polyketides (PKs), with various pharmacological functions [12]. Many silent BGCs in Burkholderia genomes remain unexplored as potential drug development targets. Using genome mining approaches, many compounds, such as bolagladins/glidochelins, gladiofungins, thailandepsins/burkholdacs, romidepsin (FK228) and so on, were discovered from Burkholderia [13].
Due to the restrictive growth conditions, only a limited number of Burkholderia species have been isolated and identified as having NP BGCs or NP producers. Thus, the isolation of more species in Burkholderia from various environments and high-quality sequencing of Burkholderia genomes still are necessary for multi-omics research, which aids in the understanding of BGC regulation and rationally designing biosynthetic pathways of NPs [14, 15].
The purpose of this research was to determine the potential of a Burkholderia strain S-53 obtained from a small mountain area, which showed a quicker growth rate among three species of Burkholderia. Its genome was sequenced and analyzed for the presence of putative NP BGCs. Our data revealed this strain contains a substantial number of BGCs, indicating that its potential capability of producing new chemicals with biological activity.
Materials and methods
Isolation and characterization of Burkholderia
We collected soil samples from a small mountain (location: Tiesi Gang in Zoushi Town, 29.12755 N,111.564903E) in Changde City, Hunan Province, P.R. China using sterilized spoons. Soil samples were pretreated by drying at room temperature and then soaked in PBS buffer (10 mL PBS/g soil). Pretreated samples were serially diluted in PBS and seeded onto solid CYMG (8 g/l Casein peptone, 4 g/L Yeast extracts, 4.06 g/L MgCl2·2H2O, and 10 ml/L 50% Glycerin) medium, then cultivated at 28 ℃ for 2 days. Whitish colonies were analyzed by colony PCR for 16S rRNA amplification with universal primers 27F(5’-AGAGTTTGATCCTGGCTCAG-3) and 1492R (5’-TACGACTTAACCCCAATCGC) under the standard PCR conditions (95℃ for 5 min, then 30 cycles of 94℃ for 1 min, 55–58℃ for 1 min and 72℃ for 90 s), followed by sequencing in Sangon Biotech (Shanghai) to pick out Burkholderia species. Morphological features of S-53 were recorded when cultivating on CYMG agar plates and molecular taxonomic approaches via TrueBac™ ID system and Type Strain Genome Server (TYGS)) were used to characterize the resultant isolates.
Measurement of the growth curve of S-53
S-53 was inoculated into CYMG microwell plates (400 μ L CYMG broth in each well) using 15 wells as parallel groups, and cultivated at 30℃ for 30 h. During cultivation, OD600 values for each well were recorded once at 1 h interval. Taken the OD600 value of each parallel well at the 0 h as the blank control, the difference (OD600/n–h -OD600/0-h) between the OD600 at each time-point (n- h) and OD600 at 0 h was calculated to represent the growth of S-53. Using the average values of OD600/n–h -OD600/0-h as Y-axis and time-point per h as X-axis, the growth curve of S-53 was obtained.
Extraction of high molecular weight genomic DNA
Burkholderia strain S-53 was inoculated into 50 mL of CYMG liquid culture medium with glass beads (3 ± 0.3 mm diameter) in a 250 mL baffled flask and cultured for 24 h at 30 °C in a 200-rpm orbital shaker. To extract genomic DNA (gDNA), 50 mL cultivated cells were collected during the exponential growth phase and washed twice with the same amount of 10 mM EDTA followed by 45 min at 37 °C with lysozyme (10 mg /mL). gDNA for gram negative bacteria was extracted using TIANamp Bacterial DNA kit from Tiangen Biochemical Technology (Beijing) Co., Ltd, according to the instructions from the manufacturer. We determined the quality and amount of extracted gDNA samples using 1% agarose gel electrophoresis on Nanodrop (Thermo Fisher Scientific, Waltham, MA, USA).
de novo Genome sequencing, assembly and annotation
To get fine sequence data, gDNA of S-53 was submitted to GENEWIZ Biotechnology Co., Ltd in Tianjin, China for genome sequencing with two methods:
-
For Illumina sequencing, firstly, DNA was fragmented into around 500 bp, repaired for blunt ends, and then modified with the base "A" through the 3' end, so that the DNA fragments can be connected to the linker with the "T" base at the 3' end. The target fragment ligation product is recovered, and then PCR is used to amplify the DNA fragments with adapters at both ends, and finally the qualified library is used for cluster preparation and sequencing.
-
For PacBio sequencing, 5–10 μg genomic DNA was sheared into 10–15 kb fragments using a g-TUBE device. Then library was constructed using the SMRTbell® Express Template Preparation Kit 2.0. The PCR products obtained using library DNA as templates were cleaned up and validated using an Agilent 2100 Bioanalyzer. Next, the qualified libraries were sequenced with pair-end PE150 on the illumina HiseqXten/Novaseq/MGI2000 System or on Sequel II sequencing platform.
The library sequenced were assembled using HGAP4/Falcon of WGS-Assembler 8.2 [16,17,18,19,20,21], then recorrected with software Pilon using previous illumine z data or Quiver.
Finding coding genes was conducted using the Prodigal [22]/Augustus [23] gene-finding software while detection of transfer RNAs (tRNAs) was done using the program tRNAscan-SE [24] with default parameter settings. rRNAs were identified by using Barrnap. Other RNAs were identified by rfam database. By BLAST using National Center for Biotechnology Information (NCBI) NR database, the coding genes were annotated (screening conditions were displayed in Table 1).
GO [25] (Gene Ontology) database and KEGG [26] (Kyoto Encyclopedia of Genes and Genomes) database were used for analyzing functions of genes and annotating the pathways. The database of COG/KOG [14] (Clusters of Orthologous Groups) was used for phylogenetic classification of proteins.
Phylogenetic analysis
Two methods were used for phylogenetic analysis of S-53:
-
(i)
Whole genome-based taxonomic analysis was conducted using the Genome BLAST Distance Phylogeny approach (GBDP) by uploading genome sequence data to the Type (Strain) Genome Server (TYGS), a free bioinformatics platform accessible at https://tygs.dsmz.de [27].
-
(ii)
A phylogenetic tree was constructed based on the 16S rRNA gene sequence of the Burkholderia strain S-53 and those extracted from the list of hits from EzBioCloud 16S database [28]. Evolutionary trees were established with maximum-likelihood methods [29] in MEGA X package [30]. The confidence of the tree topologies was assessed by 100 bootstrap replicates.
Whole genome sequences for bacterial identification
Bacterial identification utilizing whole genome sequences was conducted on the TrueBac™ ID technology, a cloud-based service [31] to reveal the genuine identification of bacterial isolates using a multitude of methods.
Comparative genomic studies/whole genome relatedness
For a whole genome-based taxonomic analysis, the genome sequence data were uploaded to the Type (Strain) Genome Server (TYGS), a free bioinformatics platform accessible at https://tygs.dsmz.de (accessed 28 December 2021). The Genome BLAST Distance Phylogeny approach (GBDP) was used to calculate dDDH (digital DNA–DNA hybridization) values and construct minimum evolution trees using TYGS [32, 27]. MEGA-X [30] was used to visualize GBDP trees. The ANI/AAI-Matrix calculator was used to calculate the average nucleotide identity (ANI) [33, 34]. The average amino acid identity (AAI) and average nucleotide identity (ANI) matrices of all conserved genes in the core genome were computed by the BLAST algorithm and visualized as heat maps for a more in-depth qualitative comparison between the genomes.
Using EZBIOCLOUD, the average nucleotide identity (ANI) of the assembled genome nucleotide files was calculated against the whole genome sequences of the strains used for 16S rRNA sequence analysis [35]. This method computes nucleotide identity through pairwise sequence alignment, yielding an overall average similarity of the genomes that is independent of sequence length.
The CGView (http://cgview.ca/) was used to generate a graphical representation of the BLAST result comparison of the available genomes to the genome of Burkholderia strain S-53.
Secondary metabolite biosynthetic gene cluster prediction
As a main approach for finding and annotating genes in BGCs across the genome, antiSMASH 6 [36] combined with ClusterBlast, ActiveSiteFinder, ClusterBlast, Cluster PFam analysis, SubClusterBlast, PRISM 4 and BAGEL 4 [36] was used for discovery of BGCs in the genome of S-53 for secondary metabolites.
Particularly, BAGEL 4 was used to mine BGCs for RiPPs and bacteriocin, whereas PRISM 4 was designed for structural prediction of secondary metabolites [37]. Several database systems, including the principles of hidden Markov model (HMM) [38], BLAST algorithm [39], PFAM [40], GenBank [41], UniprotKB [42], bactibase [43], CAMPR3 [44], and the MiBig data repository [45] were used for BGC annotation. As well, NapDos was used [46] to look for KS and C domains in these genomic sequences.
Results
Morphological and microscopic examination and phylogenetic analysis of 16S rRNA
In order to isolate more species of Burkholderia from the soil samples, we incubated a serial of isolates on CYMG medium at 30℃, followed by colony PCR amplification for 16S rRNA gene. Next by 16S rRNA-based phylogenetic analysis, 3 isolates were identified as Burkholderia, representing different species: S-53 shared the highest gene identity of 16S rDNA (99.93%) with the type strain Burkholderia stabilis (NCBI Blastn). The S-53 colonies on CYMG medium were recorded (Fig. 1).
The partial 16S rDNA gene sequence of the S-53 strain, 1337 bps in length, was deposited in the GenBank nucleotide database with an accession number of OM019084.
Among three strains we isolated, we found S-53 grows more rapidly (much shorter than 18 h into its stationary stage) than other two (longer than 18 h) (Fig. 1b). Because a higher growth rate is an important feature for species of Burkholderia for expressing of NPs, we chose S-53 for next genome sequencing.
Genomic features of Burkholderia strain S-53
The genome of Burkholderia strain S-53 is 8.254 Mbps in length and composed of 7239 protein-encoding genes, 63 tRNA genes, 18 rRNA genes and 72 ncRNA genes (Table 1 and Table 2).
Figure 2 showed a circular chromosome based S-53 genome sequence using CG View server (http://cgview.ca/), which is a web-based tool for comparative genomics analysis on circular genomes [47].
Schematic representation of the circular chromosome of Burkholderia strain S-53, created by CG View server (http://cgview.ca/). Circle 1 (outermost) displays the 3 Contigs while circle 2 displays the GC content plot and circle 3 (innermost) displays the GC skew. To indicate genome sizes inside and outside, the ruler was used in the chromosome map
The genome sequence of the Burkholderia strain S-53 has been deposited at GenBank under the GenBank accession CP090482-CP090484.
Bacterial strain identification by whole genome sequence and comparative genome analysis of S-53
Here, using TrueBac™ ID system [31] for bacterial identification based on whole genome sequence of S-53 strain, it could be identified as Burkholderia pyrrocinia (Table 3).
Further, we performed comparative genome analysis of Burkholderia S-53 (Table 4): the pairwise comparison of Burkholderia strain S-53 was recorded from TYGS [27] which is a fast-increasing discipline of genome-based taxonomy descriptions of new genera, species, and subspecies (https://tygs.dsmz.de/).
We also used the TrueBac™ ID [31] to make genome-wide alignment, and found Burkholderia strain S-53 has the highest similarity to Burkholderia pyrrocinia, and Burkholderia stabilis (Table 5). Its taxonomic ranks include Bacteria, Proteobacteria, Betaproteobacteria, Burkholderiales, Burkholderiaceae and Burkholderia.
Phylogenetic analysis via GBDP method
Using Genome BLAST Distance Phylogeny (GBDP) method and tree builder service, the phylogeny tree of Burkholderia strain S-53 using its whole genome sequence was created while FastME was used to estimate the tree using GBDP intergenomic distances derived from complete proteomes.
GBDP phylogenetic tree constructed by using 16S rRNA indicated that S-53 is similar to Burkholderia pyrrocinia DSM10685K (Fig. 3a). On the other hand, GBDP phylogenetic tree constructed by using whole genome indicated that S-53 is similar to B. stabilis ATCCBAA-67 (Fig. 3b).
GBDP phylogenetic trees of S-53. The Genome BLAST Distance Phylogeny (GBDP) method was used to estimate the phylogeny. GBDP intergenomic distances computed from complete proteomes were used to estimate the tree using FastME. Using 16S rRNA sequences (a) or whole genome sequences (b), GBDP phylogenetic trees were constructed
In all, 16S rRNA-based GBDP phylogenetic tree and whole genome alignment and comparative genome analysis suggested it to be Burkholderia pyrrocinia, while GBDP phylogenetic tree constructed by using whole genome supported it to be B. stabilis. Combining these analyses, we concluded it to be closer to Burkholderia pyrrocinia.
On the other hand, the phylogenetic tree was constructed from EzBioCloud 16S database by maximum-likelihood methods by Mega X application with 100 bootstrap values depicted in the Fig. 4. According to the maximum likelihood method, S-53 is close to the Burkholderia stabilis ATCC BAA-67 and Burkholderia pyrrocinia DSM 10,685.
Evolutionary analysis of S-53 using the Maximum Likelihood method. The proportion of phylogenetic trees with the same taxonomy is given next to the branches. For the heuristic search, we used Neighbor-Join and BioNJ algorithms on a matrice of pairwise distances evaluated using Maximum Composite Likelihood (MCL) and chose the topology with the best log likelihood value. The tree's branch lengths are measured in substitutions per location. These 51 nucleotides were studied. Gaps and incomplete data were removed from all spots (complete deletion option). The final dataset has 1279 positions. MEGA X was used to study evolution
Prediction of NP BGCs in S-53 genome
Using antiSMASH 6.0 [36], BAGEL 4 [48] and PRISM 4 [49], we found a lot of BGCs on the Burkholderia strain S-53 genome for different secondary metabolites.
Through the prediction using antiSMASH 6.0, 15 BGCs were discovered (Table 6 and Fig. 5). The major BGC types include those for NRPs (2 BGCs), terpene (4GCs) and hybrid (3 BGCs) (Fig. 5).
Among 15 BGCs, five BGCs (cluster 3, 5, 8, 10 and 13 regions 1.3, 2.1, 3.1, 3.3 and 3.6) were more than 50% identical to known BGCs. Other BGCs exhibited just a low degree of similarity or resemblance to previously identified BGCs, implying that Burkholderia sp. S-53 has a significant potential for the production of novel NPs in the future.
Moreover, we performed BAGEL analysis on S-53 genome and identified additional 2 different clusters for bacteriocins and RiPPs (Table 7).
In addition, PRISM algorithm (https://prism.adapsyn.com/results/4c5c8259bfef7b827d3c7b9cdc95df6c) was used here to predict the structures of genetically encoded natural products using Burkholderia sp. S-53 genomes.
Figure 6 showed predicted compounds by a total 10 clusters, including 3 for NRPs, 2 for PKs, 1 for Class II/III bacteriocin, 1 for aryl polyene and 1 for acyl homoserine lactone (Fig. 6).
Discussion and conclusion
The high potential of Burkholderia to produce bioactive NPs has been reported with an increasing publishing record in decade years. Moreover, the rapid growth rate and low fermentation cost make them as a potential host for heterologous expression of some NP BGCs, aided by the establishment of genetic manipulation systems [50].
In this work, genomic investigation of microbes isolated from underexplored mountain habitats found several strains of Burkholderia species, among of which, S-53 attracted us for its comparatively quicker growth rate (16–18 h for entering stationary stage, compared to 24 h for general Burkholderia species), as a critical feature when it could be developed into a host for expressing some NP BGCs later.
We identified it using different methods and analyzed its evolution. Given that modern bacterial taxonomy uses genome sequence data to identify taxa, by means of genome sequence data, identification of a bacterial species is considered always to be more correct and persuasive. Thus, though different molecular methods for bacterial identification gave a little different result, our data deduced it to be Burkholderia pyrrocinia.
Other subspecies of Burkholderia pyrrocinia were ever isolated from different habitats, such as Burkholderia pyrrocinia JK-SH007, a plant growth-promoting bacteria from plat rhizosphere [51]. Burkholderia pyrrocinia, along with Burkholderia cenocepacia and Burkholderia ambifari, was referenced as Burkholderia cepacia complex (BCC) species, which are most frequently associated with roots of crop plants [52].
Taxonomy and identification of species in Burkholderia still are quite challenging. Though a high similarity of 16S rDNA ranging 98–100% often is used as “common standard” for bacterial identification at species level, it could not be applicable to classification of Burkholderia species [53], especially, for classification of BCC group of Burkholderia. So, the whole-genome-sequence-based taxonomic analysis could give comparably more reliable results, when combining other molecular methods.
Genomics-based bottom-up techniques have been developed to reveal previously undiscovered natural product biosynthesis pathways [54]. Here, whole genome sequencing and bioinformatic analyses of Burkholderia strain S-53 revealed many secondary metabolite biosynthetic gene clusters. Moreover, bioinformatics analysis uncovered more than two-thirds of BGCs in S-53 are not related to recognized clusters (Table 6).
These data supported that S-53 could be a good candidate used for identifying new NPs. Next, more research is needed to improve, isolate, and identify new bioactive natural products from this strain and to investigate the possibility of it to be as chassis for expressing of new NPs.
Availability of data and materials
The partial 16S rDNA gene sequence and genome sequence of the S-53 strain was deposited in the GenBank nucleotide database with an accession number of OM019084 and CP090482-CP090484.
References
A. Resistance, “Tackling a Crisis for the Health and Wealth of Nations,” Rev. Antimicrob. Resist., 2014.
Toner E, Adalja A, Gronvall GK, Cicero A, Inglesby TV. Antimicrobial resistance is a global health emergency. Heal Secur. 2015;13(3):153–5.
Genilloud O. The re-emerging role of microbial natural products in antibiotic discovery. Antonie Van Leeuwenhoek. 2014;106(1):173–88.
Hutchings MI, Truman AW, Wilkinson B. Antibiotics: past, present and future. Curr Opin Microbiol. 2019;51:72–80.
Katz L, Baltz RH. Natural product discovery: past, present, and future. J Ind Microbiol Biotechnol. 2016;43(2–3):155–76.
R. D. Firn and C. G. Jones, “An explanation of secondary product ‘redundancy,’” in Phytochemical diversity and redundancy in ecological interactions, Springer, 1996, pp. 295–312.
Galanie S, Entwistle D, Lalonde J. Engineering biosynthetic enzymes for industrial natural product synthesis. Nat Prod Rep. 2020;37(8):1122–43.
K. Alam, J. Hao, Y. Zhang, and A. Li, “Synthetic biology-inspired strategies and tools for engineering of microbial natural product biosynthetic pathways,” Biotechnol. Adv., p. 107759, 2021.
C. L. Schoch et al., “NCBI Taxonomy: a comprehensive update on curation, resources and tools,” Database, vol. 2020, 2020.
Depoorter E, Bull MJ, Peeters C, Coenye T, Vandamme P, Mahenthiralingam E. Burkholderia: an update on taxonomy and biotechnological potential as antibiotic producers. Appl Microbiol Biotechnol. 2016;100(12):5215–29.
Kunakom S, Eustáquio AS. Burkholderia as a source of natural products. J Nat Prod. 2019;82(7):2018–37.
Alam K, et al. In silico genome mining of potential novel biosynthetic gene clusters for drug discovery from Burkholderia bacteria. Comput Biol Med. 2022;140: 105046.
Liu X, Cheng Y-Q. Genome-guided discovery of diverse natural products from Burkholderia sp. J Ind Microbiol Biotechnol. 2014;41(2):275–84.
Hwang S, et al. Primary transcriptome and translatome analysis determines transcriptional and translational regulatory elements encoded in the Streptomyces clavuligerus genome. Nucleic Acids Res. 2019;47(12):6114–29.
Li Y, Zhang C, Liu C, Ju J, Ma J. Genome sequencing of Streptomyces atratus SCSIOZH16 and activation production of nocardamine via metabolic engineering. Front Microbiol. 2018;9:1269.
E. W. Myers et al., “A whole-genome assembly of Drosophila,” Science (80-. )., vol. 287, no. 5461, pp. 2196–2204, 2000.
J. C. Venter et al., “The sequence of the human genome,” Science (80-. )., vol. 291, no. 5507, pp. 1304–1351, 2001.
Istrail S, et al. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc Natl Acad Sci. 2004;101(7):1916–21.
Levy S, et al. The diploid genome sequence of an individual human. PLoS Biol. 2007;5(10): e254.
Goldberg SMD, et al. A Sanger/pyrosequencing hybrid approach for the generation of high-quality draft assemblies of marine microbial genomes. Proc Natl Acad Sci. 2006;103(30):11240–5.
Berlin K, Koren S, Chin C-S, Drake JP, Landolin JM, Phillippy AM. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat Biotechnol. 2015;33(6):623–30.
Delcher AL, Bratke KA, Powers EC, Salzberg SL. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007;23(6):673–9.
Stanke M, Schöffmann O, Morgenstern B, Waack S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 2006;7(1):1–11.
Lowe TM, Eddy SR. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997;25(5):955–64.
G. O. Consortium, “The Gene Ontology (GO) database and informatics resource,” Nucleic Acids Res., vol. 32, no. suppl_1, pp. D258–D261, 2004.
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000;28(1):27–30.
Meier-Kolthoff JP, Göker M. TYGS is an automated high-throughput platform for state-of-the-art genome-based taxonomy. Nat Commun. 2019;10(1):1–10.
Yoon S-H, et al. Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies. Int J Syst Evol Microbiol. 2017;67(5):1613.
Felsenstein J. Evolutionary trees from DNA sequences: a maximum likelihood approach. J Mol Evol. 1981;17(6):368–76.
Kumar S, Stecher G, Li M, Knyaz C, Tamura K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol Biol Evol. 2018;35(6):1547.
Ha S-M, et al. Application of the whole genome-based bacterial identification system, TrueBac ID, using clinical isolates that were not identified with three matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) systems. Ann Lab Med. 2019;39(6):530–6.
Camacho C, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10(1):1–9.
Lee I, Kim YO, Park S-C, Chun J. OrthoANI: an improved algorithm and software for calculating average nucleotide identity. Int J Syst Evol Microbiol. 2016;66(2):1100–3.
L. M. Rodriguez-R and K. T. Konstantinidis, “The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes,” PeerJ Preprints, 2016.
Yoon S-H, Ha S-M, Lim J, Kwon S, Chun J. A large-scale evaluation of algorithms to calculate average nucleotide identity. Antonie Van Leeuwenhoek. 2017;110(10):1281–6.
K. Blin et al., “antiSMASH 6.0: improving cluster detection and comparison capabilities,” Nucleic Acids Res., p. 1, 2021.
Machado H, Sonnenschein EC, Melchiorsen J, Gram L. Genome mining reveals unlocked bioactive potential of marine Gram-negative bacteria. BMC Genomics. 2015;16(1):1–12.
Churchill GA. Stochastic models for heterogeneous DNA sequences. Bull Math Biol. 1989;51(1):79–94.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990;215(3):403–10.
Finn RD, et al. Pfam: the protein families database. Nucleic Acids Res. 2014;42(D1):D222–30.
D. A. Benson et al., “GenBank Nucleic Acids Res 41 (D1),” D36–D42, 2013.
U. Consortium. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43(D1):D204–12.
R. Hammami, A. Zouhir, C. Le Lay, J. Ben Hamida, and I. Fliss, “BACTIBASE second release: a database and tool platform for bacteriocin characterization,” Bmc Microbiol., vol. 10, no. 1, pp. 1–5, 2010.
Waghu FH, Barai RS, Gurung P, Idicula-Thomas S. CAMPR3: a database on sequences, structures and signatures of antimicrobial peptides. Nucleic Acids Res. 2016;44(D1):D1094–7.
Medema MH, et al. Minimum information about a biosynthetic gene cluster. Nat Chem Biol. 2015;11(9):625–31.
Ziemert N, Podell S, Penn K, Badger JH, Allen E, Jensen PR. The natural product domain seeker NaPDoS: a phylogeny based bioinformatic tool to classify secondary metabolite gene diversity. PLoS ONE. 2012;7(3): e34064.
J. R. Grant and P. Stothard, “The CGView Server: a comparative genomics tool for circular genomes,” Nucleic Acids Res., vol. 36, no. suppl_2, pp. W181–W184, 2008.
van Heel AJ, de Jong A, Song C, Viel JH, Kok J, Kuipers OP. BAGEL4: a user-friendly web server to thoroughly mine RiPPs and bacteriocins. Nucleic Acids Res. 2018;46(W1):W278–81.
Skinnider MA, et al. Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences. Nat Commun. 2020;11(1):1–9. https://doi.org/10.1038/s41467-020-19986-1.
Liu J, et al. Rational construction of genome-reduced Burkholderiales chassis facilitates efficient heterologous production of natural products from proteobacteria. Nat Commun. 2021;12(1):1–16.
W.-H. Liu et al., “Indole-3-acetic acid in Burkholderia pyrrocinia JK-SH007: Enzymatic identification of the indole-3-acetamide synthesis pathway,” Front. Microbiol., p. 2559, 2019.
Alisi C, et al. Metabolic profiling of Burkholderia cenocepacia, Burkholderia ambifaria, and Burkholderia pyrrocinia isolates from maize rhizosphere. Microb Ecol. 2005;50(3):385–95.
Sfeir MM. Burkholderia cepacia complex infections: more complex than the bacterium name suggest. J Infect. 2018;77(3):166–70.
Winter JM, Behnken S, Hertweck C. Genomics-inspired discovery of natural products. Curr Opin Chem Biol. 2011;15(1):22–31.
Acknowledgements
Not applicable
Funding
This study was supported by the National Key R&D Program of China (2018YFA0900400), National Natural Science Foundation of China (32270088 and 32170038), the Open Project Program of the State Key Laboratory of Bio-based Material and Green Papermaking (KF201825) and the 111 Project (B16030).
Author information
Authors and Affiliations
Contributions
AL conceived the concept and funds, supervised the work, and validated the results. KA, YMZ and XL conducted all experiments, analyzed the data, and wrote the original draft of manuscript. KA; KG; JH; LZ; conducted software, SI, MMI, GL; conducted validation, YZ; conducted formal analysis, YZ, RL visualization and writing and data analysis. All authors read and approved the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Alam, K., Zhao, Y., Lu, X. et al. Isolation, complete genome sequencing and in silico genome mining of Burkholderia for secondary metabolites. BMC Microbiol 22, 323 (2022). https://doi.org/10.1186/s12866-022-02692-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12866-022-02692-x
Keywords
- Burkholderia
- Natural products
- Genome mining
- Biosynthetic gene cluster