Marine bacterial communities in the upper gulf of Thailand assessed by Illumina next-generation sequencing platform

Background The total bacterial community plays an important role in aquatic ecosystems. In this study, bacterial communities and diversity along the shores of the Upper Gulf of Thailand were first characterized. The association between bacterial communities and types of land use was also evaluated. Results The bacterial communities and diversity of seawater in the Upper Gulf of Thailand, with regard to types of land use, were first revealed by using Illumina next-generation sequencing. A total of 4953 OTUs were observed from all samples in which 554 OTUs were common. The bacterial communities in sampling sites were significantly different from each other. The run-off water from three types of land use significantly affected the community richness and diversity of marine bacteria. Aquaculture sites contained the highest levels of community richness and diversity, followed by mangrove forests and tourist sites. Seawater physicochemical parameters including salinity, turbidity, TSS, total N, and BOD5, were significantly different when grouped by land use. The bacterial communities were mainly determined by salinity, total N, and total P. The species richness estimators and OTUs were positively correlated with turbidity. The top ten most abundant phyla and genera as well as the distribution of bacterial classes were characterized. The Proteobacteria constituted the largest proportions in all sampling sites, ranging between 67.31 and 78.80%. The numbers of the Marinobacterium, Neptuniibacter, Synechococcus, Candidatus Thiobios, hgcI clade (Actinobacteria), and Candidatus Pelagibacter were significantly different when grouped by land use. Conclusions Type of land use significantly affected bacterial communities and diversity along the Upper Gulf of Thailand. Turbidity was the most influential parameter affecting the variation in bacterial community composition. Salinity, total N, and P were the ones of the important factors that shaped the bacterial communities. In addition, the variations of bacterial communities from site-to-site were greater than within-site. The Proteobacteria, Bacteroidetes, Actinobacteria, Cyanobacteria, Verrucomicrobia, Euryarchaeota, Planctomycetes, Firmicutes, Deep Sea DHVEG-6, and Marinimicrobia were the most and common phyla distributed across the Upper Gulf of Thailand.

urban areas [3], therefore it is of importance for natural resources, environment, and public health. Coastal seawater of the Upper Gulf of Thailand is utilized for environmental preservation, coral conservation, conservation of natural resources, aquaculture, fishery, water sport, recreation, transportation, and industry. Despite its importance, the total microbiota has not been investigated. The total bacterial community, which plays an important role in aquatic ecosystems, should be considered as a rigorous criterion for water quality to promote sustainable development. It is important to evaluate changes in the microbial community structure in aquatic systems because the microbial community is the foundation of biogeochemical cycles and pollutant biodegradation [4]. In particular, there is growing interest in the role of marine microorganisms that inhabit extreme habitats in biogeochemical processes, pollution, and health. Thermophiles, halophiles, alkalophiles, psychrophiles, piezophiles, and polyextremophiles have been isolated from marine environments. Marine environments represent the richest source of new genes, enzymes, and natural products [5]. Marine bacterial community structure is affected by several factors such as inorganic nutrient concentration [6], N [7,8], P [7], change in season, adjacent habitat [9], depth [8,10], oxygen [10], protist predation pressure [11], salinity [7,8,12], dominance of algae, particulate organic carbon, Si (OH) 4 [8], human disturbance, and sand mining activity [13].
To explore the total bacterial community in the environment, high-throughput next-generation sequencing (NGS) technology of the taxonomically informative 16S rRNA gene provides the most powerful approach because it enables the classification of individual reads to specific taxa [14]. The contemporary advances in NGS have not only enabled finer characterization of bacterial genomes but also provided deeper taxonomic identification of complex microbiomes [15]. NGS approach was employed to survey microbial communities from several marine environments such as the Gulf of Mexico [16,17], the Georgetown Coast, Malaysia [18], Malipo Beach, South Korea [19], the Canadian Arctic archipelago [8], and the South Sea, Korea [13].
In this study, we investigated 1) the marine bacterial communities at nine sites along the shores of the Upper Gulf of Thailand by using Illumina NGS of the V4 variable region of 16S rRNA gene; 2) the association between bacterial community structures and three types of land use including mangrove forests, tourist sites, and aquaculture sites; 3) effect of seawater physicochemical parameters on the abundance of specific taxa; and 4) the correlation between seawater physicochemical parameters and types of land use.

Seawater physicochemical parameters
Nine sampling sites in seven provinces along the shores of the Upper Gulf of Thailand, over a distance of approximately 769.97 km, are shown in Table 1 and Fig. 1. Seawater parameters including temperature, pH, salinity, turbidity, total suspended solid (TSS), total N, total P, and five-day biochemical oxygen demand (BOD 5 ) at nine sampling sites are shown in Table 2. Temperatures and pH values of seawater samples among nine sampling sites ranged from 27°C to 31°C and 6.7 to 7.5, respectively. Salinity (presented as % NaCl) of all sites were 4.0, except sites F (aquaculture site at Donhoylhod) and G (aquaculture site at Bangtaboon Bay) that were 3.0 and 1.0, respectively. Turbidity, TSS, total N, and total P of all sites ranged from 2.32 ± 0.03 to 102.00 ± 1.00 nephelometric turbidity units (NTUs), 22.00 ± 0.80 to 177.66 ± 5.50 mg/l, 0.13 ± 0.00 to 1.30 ± 0.13 mg/l, and 0.03 ± 0.00 to 0.10 ± 0.01 mg/l, respectively. Site G (aquaculture site at Bangtaboon Bay) had the highest turbidity, TSS, total N, and total P which were significantly different from those of other sites. On the contrary, site A (mangrove forest at Black Sand Beach) was the only site that had significantly lowest turbidity. Sites A (mangrove forest at Black Sand Beach), B (mangrove forest at Kungkrabaen Bay), C (tourist site at Suanson Beach), and I (tourist site at Wanakorn Beach) shared the lowest rank of TSS which was significantly different from that of other sites. Site C (tourist site at Suanson Beach) contained the lowest amount of total N which was not significantly from that of site D (tourist site at Pattaya Beach). Sites A (mangrove forest at Black Sand Beach) and E (aquaculture site at Angsila old market) contained the lowest amounts of total P which were not significantly different from those of sites B and C. BOD 5 values of all sites ranged between 0.90 ± 0.00 and 3.76 ± 0.23 mg/l. Site B (mangrove forest at Kungkrabaen Bay) had the highest BOD 5 value, differing significantly from that of other sites. On the contrary, sites F (aquaculture site at Donhoylhod) and I (tourist site at Wanakorn Beach) had the lowest BOD 5 values.
In this study, the results in Table 3 show that air temperature, pH, and total P were not significantly different when grouped by land use (P = 0.27, 0.35, and 0.13). On the contrary, seawater temperature, % NaCl, turbidity, TSS, total N, and BOD 5 were significantly different when grouped by land use (P = 0.00, 0.01, 0.00, 0.00, 0.00, and 0.01). Turbidity, TSS, and total N of aquaculture sites were highest, differing significantly from those of other types of land use. BOD 5 value of mangrove forests was highest, differing significantly from that of tourist sites. Aquaculture sites had the lowest values of % NaCl that were significantly different from those of other types of land use.

Sequence analyses and diversity indices
A total of 2,478,774 raw reads were obtained from 27 DNA samples (3 replicates/sampling site). After tag merge and quality control, a total of 2,425,463 clean tags (97.85% of raw reads) were obtained. After that, potential chimera tags were removed with the UCHIME algorithm, resulting in a total of 2,177,667 taxon tags. The tags with ≥97% similarity were grouped into the same operational taxonomic units (OTUs). A total of 4953 OTUs were observed from all samples, with a mean Good's coverage of 99.00 ± 0.00%. ACE (abundance-based coverage estimator) and Chao1 that represent richness as well as Shannon-Weaver and Simpson that indicate diversity were analyzed ( Table 2). When measured by ACE and Chao1, samples collected were significantly different when grouped by land use (P = 0.02 and 0.01) ( Table 4). Aquaculture sites contained the highest community richness, followed by mangrove forests and tourist sites, respectively. The indices of community diversity assessed by Shannon and Simpson also exhibited that samples collected were significantly different when grouped by land use (P = 0.00 and 0.00). The bacterial community diversity of aquaculture sites was significantly highest, followed by mangrove forests and tourist sites, respectively ( Table 4).
The bacterial richness (ACE and Chao1) of site F (aquaculture site at Donhoylhod) was highest, followed by sites A (mangrove forest at Black Sand Beach) and E (aquaculture site at Angsila old market), respectively. The bacterial richness of site I (tourist site at Wanakorn Beach) was lowest. Higher Shannon-Weaver and Simpson indices

Illumina NGS and bacterial community structure
Rarefaction analysis was used to standardize and compare taxon richness among samples and to identify whether the samples were randomly selected. According to the rarefaction curves of samples (Additional file 1: Figure S1), all of the samples were randomly collected. Moreover, aquaculture sites exhibited the steepest rarefaction curves, indicating the highest taxon richness, while tourist sites exhibited the most gradual curves. As shown in a Flower display (Additional file 2: Figure S2), 554 OTUs were common in all sampling sites. Site G (aquaculture site at Bangtaboon Bay) had the highest unique OTUs (259 OTUs), followed by sites A (mangrove forest at Black Sand Beach) and F (aquaculture site at Donhoylhod), respectively, whereas sites H (mangrove forest at Pranburi forest park) and I (tourist site at Wanakorn Beach) had the equally lowest unique OTUs (58 OTUs). The top ten most abundant phyla among nine sampling sites were in different patterns, as depicted in Fig. 2.   The distribution of bacterial classes in each sampling site is shown in Fig. 3. The colors in a heat map chart indicate the relative abundance of the community. The colors which vary from deep blue to dark brown represent low-to high-levels of the relative abundance. The most abundant classes in each site are represented as dark-brown squares in a heat map chart. In site A (mangrove forest at Black Sand Beach), the Holophagae, Anaerolineae, and Chloroplast were more abundant than others. The Bacilli, OM190, Sphingobacteriia, Acidimicrobiia, and Verrucomicrobiae were the predominant classes in site B (mangrove forest at Kungkrabaen Bay).
The greatest abundance of the JdFBHP3 was found in site C (tourist site at Suanson Beach). The Mollicutes, γ-Proteobacteria, and Opitutae were the predominant classes in site D (tourist site at Pattaya Beach). Site E (aquaculture site at Angsila old market) had the most abundance of the γ-Proteobacteria. Site F (aquaculture site at Donhoylhod) harbored the high numbers of the Bacteroidia, Epsilonproteobacteria, Nitriliruptoria, and Clostridia. Site G (aquaculture site at Bangtaboon Bay) had several mostabundant classes such as the Chloroflexia, Spartobacteria, Thermoleophilia, Chlorobia, and Planctomycetacia. The unidentified Marinimicrobia and unidentified Cyanobacteria were the most abundant classes in sites H (mangrove forest at Pranburi forest park) and I (tourist site at Wanakorn Beach), respectively.
The top ten most abundant genera present in each sampling site are shown in an Additional file 3: Table  S1). The highest numbers of the genus Marinobacterium was presented in four sites including E (aquaculture site at Angsila old market), H (mangrove forest at Pranburi  Fig. 4 revealed a significant clustering of samples by sampling site, and this separation was supported by analysis of molecular variance (AMOVA) (P < 0.001). Moreover, the variations in community composition among groups and within groups were evaluated by analysis of similarity (ANOSIM) and multi-response permutation procedure (MRPP). The results of both methods indicate that there were significant differences when comparing microbiota by sampling site (P < 0.05), and the variations of inter-group were larger than those of inner-group (r = 1). The unweighted-pair group method with arithmetic mean (UPGMA) dendrogram of the relative abundance at the phylum level depicted in Fig. 2 was divided into four clusters. The first cluster that contained sites A (mangrove forest at Black Sand Beach), C (tourist site at Suanson Beach), and D (tourist site at Pattaya Beach), was closer to the second cluster that contained sites B (mangrove forest at Kungkrabaen Bay) and E (aquaculture site at Angsila old market). The third cluster was composed of sites H (mangrove forest at Pranburi forest park) and I (tourist site at Wanakorn Beach). The last cluster containing sites F (aquaculture site at Donhoylhod) and G (aquaculture site at Bangtaboon Bay) was more separated from the other clusters.

Effect of environmental factors on the bacterial communities
Effect of seawater physicochemical parameters on the bacterial communities was analyzed. The results show that members of the α-Proteobacteria and Flavobacteriia were positively associated with % NaCl (Spearman's r = 0.434, P = 0.24; r = 0.63, P = 0.06) and negatively associated with TSS (r = − 0.68, P = 0.04; r = − 0.76, P = 0.01). The γ-Proteobacteria were also positively associated with % NaCl (r = 0.39, P = 0.30) and negatively associated with total P (r = − 0.38, P = 0.30). The members of β-Proteobacteria, unidentified Proteobacteria, and Actinobacteria were positively associated with total N (r = 0.88, P = 0.00; r = 0.88, P = 0.00; r = 0.94, P = 0.00) and negatively associated with % NaCl (r = − 0.73, P = 0.02; r = − 0.63, Acidimicrobiia were positively associated with total N (r = 0.25, P = 0.50) and negatively associated with seawater temperature (r = − 0.52, P = 0.15). The δ-Proteobacteria were positively associated with BOD 5 (r = 0.51, P = 0.15) and negatively associated with total N (r = − 0.59, P = 0.09). Moreover, in this study we also found that the numbers of the Marinobacterium, Neptuniibacter, Synechococcus, CandidatusThiobios, hgcI clade (Actinobacteria), and Candidatus Pelagibacter were significantly different when grouped by land use (P = 0.00, 0.02, 0.02, 0.00, 0.03, and 0.00). The numbers of Synechococcus and Candidatus Pelagibacter were significantly higher in tourist sites than those in other types of land use, whereas Candidatus Thiobios was only one genus whose number was significantly higher in aquaculture sites than that in other types of land use (Additional file 4: Table S2).

Discussion
Marine environment is the one of the most extensive habitats for microorganisms, covering more than twothirds of the surface of the earth [20]. Marine bacteria play important roles in energy and matter fluxes in the sea [7]. Normal cell counts of more than 10 5 cells/ml in surface seawater support the prediction that the oceans harbor 3.6 × 10 29 microbial cells with a total cellular carbon content of approximately 3 × 10 17 g [21], thereby an understanding of the marine bacterial distribution and diversity is essential. Although some studies investigated the microbial distribution and diversity with regard to the environmental and geographical conditions [6,22,23], the marine bacterial distribution and diversity regarding types of land use in the Gulf of Thailand have never been reported. In this study, we first investigated the communities and diversity of bacteria associated with seawater collected from three different types of land use including mangrove forests, tourist sites, and aquaculture sites, over a distance of approximately 769.97 km along the shores of the Upper Gulf of Thailand. The result shows that run-off water from each type of land use significantly affected the community richness and diversity of marine bacteria. Aquaculture sites contained the highest levels of community richness and diversity, followed by mangrove forests and tourist sites. The maximum richness and diversity in aquaculture sites possibly resulted from aquaculture activities such as feeding that increases the numbers of aquatic animals who are effective feeders promoting high levels of aquatic bacteria which are released from their feces and body fluids [24] and addition of readily accessible C source that significantly increases the bacterial biomass [25]. Mangrove forests are complex and dynamic ecosystems that are highly variable in several physicochemical conditions including salinity, flooding, light, temperature, and nutrient, which promote the bacterial diversity. It was reported that mangrove species were the main factors influencing their rhizosphere bacterial communities [26]. The run-off water in tourist sites may come from various sources such as swimmers, trash disposal from tourists, domestic wastewater, and illegal discharge from recreation boats [24]. Hamilton et al. [27] reported that pollutants from anthropogenic-influenced sources conveyed diverse bacteria into beaches and seawater.
Moreover, when determining the effect of environmental factors on the bacterial communities, we found that species richness estimators and OTUs were positively correlated with turbidity. Aquaculture sites had the highest average values of turbidity, TSS, and total N, which were significantly different from those of other types of land use. It can be concluded that a higher seawater turbidity level contributed to higher levels of species richness and OTUs. The positive and negative correlations between bacterial communities in class level and environmental factors were analyzed. The results show that the α-Proteobacteria, γ-Proteobacteria, and Flavobacteriia were positively associated with % NaCl. The Cyanobacteria were positively associated with total P. The members of β-Proteobacteria, unidentified Proteobacteria, Acidimicrobiia, and Actinobacteria were positively associated with total N. These results indicate that salinity, total N, and total P were the ones of the main factors shaping the bacterial communities of nearshore seawater in the Upper Gulf of Thailand, whereas pH and seawater temperature were not likely to affect the bacterial communities.
Our findings agree with that of Suh et al. [7] who studied seasonal dynamic of marine microbial community in the South Sea of Korea and found that salinity, N, and P contents contributed substantially to the spatial distribution of bacterial community composition. Salinity showed a marked correlation with the spatial distribution of the Flavobacteriia, while the α-Proteobacteria were greatly affected by N and dissolved oxygen. Likewise, the γ-Proteobacteria in seawater of Mallorca Island in Spain were positively correlated with salinity [12]. Inorganic nutrients were reported to importantly affect the bacterial community structures of seawater from the Mediterranean Sea, France, under euthrophication conditions [28] and natural seawater across Japan [6].
To more clearly study the similarity among bacterial communities in different sampling sites, UPGMA analysis was applied to display the integration and the relative abundance of each phylum in each site. The result shows that the relative abundance of bacterial phyla in seawater of sites F (aquaculture site at Donhoylhod) and G (aquaculture site at Bangtaboon Bay) was different from that of other sites. This may be affected by the physicochemical factors that shaped the bacterial communities of those two sites which ranked first and second in turbidity, TSS, and total N. Moreover, the proportions of the Proteobacteria, Actinobacteria, Verrucomicrobia, Euryarchaeota, and Deep Sea DHVEG-6 in both sites were more similar to each other than to other sites.
In this study, a total of 4953 OTUs were observed from all samples. Indeed, the amount of the bacterial OTUs was not necessarily correlated with location. The amounts of the bacterial OTUs in samples greatly varied depending on several physicochemical and environmental factors [29,30]. Other studies revealed that the seawater collected from Gosung Bay (the South Sea of Korea) and Mallorca Island in Spain had only 900 OTUs [7] and 965 OTUs [12], respectively. The considerably more OTUs were observed in marine sediments. Totals of 6039 OTUs, 6059 OTUs, and 5700 to 7600 OTUs were obtained from marine sediments around the Kaichu-Doro Causeway in Okinawa, Japan [31], marine sediments in Yam O Wan Bay, Hong Kong [32], and marine sediments from Jeju Island, South Korea [20], respectively. Moreover, this study found that the proportions of the Proteobacteria were highest in all sampling sites, followed by the Bacteroidetes and Actinobacteria. This result corresponds with that of Suh et al. [7] who reported that the Proteobacteria was the dominant phylum in seawater from Gosung Bay, South Korea, followed by the Bacteroidetes and Actinobacteria. Similarly, most of the bacterial sequence reads in marine sediments from Jeju Island, South Korea, were also associated with the Proteobacteria and Bacteroidetes, followed by the Actinobacteria, Acidobacteria, and Firmicutes [20].

Conclusions
This is the first report of the bacterial communities and diversity associated with seawater along the Upper Gulf of Thailand that was categorized into three types of land use including mangrove forests, tourist sites, and aquaculture sites. The run-off water from each type of land use significantly affected the community richness and diversity. The highest community richness and diversity were obtained from aquaculture sites, followed by mangrove forests and tourist sites. Turbidity was the most influential parameter affecting the variation in bacterial community composition. Salinity, total N, and P were the ones of the important factors that shaped the bacterial communities in near-shore seawater from the Upper Gulf of Thailand, whereas pH and seawater temperature less affected the bacterial communities. In addition, the variations of bacterial communities from site-to-site were greater than within-site. The Proteobacteria, Bacteroidetes, Actinobacteria, Cyanobacteria, Verrucomicrobia, Euryarchaeota, Planctomycetes, Firmicutes, Deep Sea DHVEG-6, and Marinimicrobia were the most and common phyla distributed across the Upper Gulf of Thailand.

Sample collection and determination of seawater parameters
Seawater was sampled on 10th November and 1st December 2018 at nine sites in seven provinces along the shores of the Upper Gulf of Thailand, over a distance of approximately 769.97 km (Table 1 and Fig. 1). Sampling sites were selected based on types of land use that were presumably influenced by different run-off conditions. Three sites (A, B, and H) were mangrove forests in Black Sand Beach, Kungkrabaen Bay, and Pranburi forest park, respectively. Three tourist sites (C, D, and I) were Suanson Beach, Pattaya Beach, and Wanakorn Beach, respectively. Three aquaculture sites (E, F, and G) were Angsila old market, Donhoylhod, and Bangtaboon Bay, respectively. All sampling sites were in public areas thus no specific permission was required for seawater collection. At each sampling site, near-surface seawater (12 L), approximately 2 m from the shoreline, was collected in triplicate and stored on ice during within-a-day transit.
Air and seawater temperatures were measured at each sampling site at the time of seawater collection. Seawater samples were analyzed for physicochemical parameters. pH, salinity, and turbidity (in NTU) were measured within 48 h by using a pH meter (Metrohm 827 pH lab), a hand refractometer (Atago N-1E), and a turbidimeter (HACH 2100P), respectively. BOD 5 , total N, total P, and TSS contents were analyzed according to American Public Health Association [33] by using the azide modification method, macro-Kjeldahl method, sulfuric acid-nitric acid digestion method, and drying at 103-105°C, respectively.

Illumina NGS
Seawater samples were prefiltered through sterile Whatman no. 2 filter papers to remove suspended particles and the filtrate was subsequently filtered through 0.2 μm sterile cellulose nitrate membrane filters (Sartorius, Stedim Biotech., Gottingen, Germany) [4]. DNA was extracted from seawater samples using E.Z.N.A ® Water DNA kit (Omega Bio-tek, Inc., Norcross, GA, USA), according to the manufacturer's instruction. The V4 variable region of the 16S rRNA gene was amplified by using the 515F and 806R specific primer set with the barcodes [34,35]. PCR reactions were carried out with Phusion ® High-Fidelity PCR Master Mix (NEB, Ipswitch, MA, USA). The PCR products were purified using a Qiagen gel extraction kit (Qiagen, Inc., Valencia, CA, USA). The libraries were generated with TruSeq ® DNA PCR-Free sample preparation kit (Illumina, Inc., San Diego, CA, USA), and analyzed by HiSeq2500 PE250 sequencing system (Illumina, Inc., San Diego, CA, USA), according to the manufacturer's instructions. Negative controls (sterile water) were carried out through amplification and sequencing. Data was returned as fastq files and deposited in the Sequence Read Archive of the National Center for Biotechnology Information under BioProject accession number PRJNA530863 (SRA: SRP190963).

Data processing and bioinformatic analyses
Paired-end reads were merged by using the FLASH program (V1.2.7) [36]. Quality filtering on the raw tags was performed to obtain the high-quality clean tags according to the QIIME software (V1.7.0) [37,38]. The tags were compared with the reference database using the UCHIME algorithm to detect chimera sequences. Chimera sequences were removed to obtain the effective tags [39,40]. For OTU clustering and species annotation, sequence analysis was performed with all effective tags by using the Uparse software (V7.0.1001). Sequences with ≥97% similarity were assigned to the same OTUs. The Mothur software (V1.36.1) [41] was used to align each representative sequence against the SSU rRNA database of SILVA [42] for species annotation at each taxonomic level [43]. The phylogenetic relationship of all OTUs derived from representative sequences was analyzed by using the MUSCLE program (V3.8.31) [44].

Statistical analyses
Alpha diversity, including community richness (Chao1 and ACE estimators), community diversity (Shannon and Simpson indices), and index of sequencing depth (the Good' coverage) as well as rarefaction data were calculated by using the QIIME software (V1.7.0) and displayed by the R software (V2.15.3). PCoA was performed to obtain principal coordinates and visualize complex, multidimensional data, which was then displayed by the WGCNA, stat, and ggplot2 packages in the R software (V2.15.3). The UPGMA clustering was performed as a type of hierarchical clustering method to interpret the distance matrix using average linkage and conducted by using the QIIME software (V1.7.0). The nonparametic method, ANOSIM, was conducted to determine whether the bacterial community structures significantly differ among groups and within groups. MRPP was calculated with the R software (V2.15.3). AMOVA was performed using the Mothur software (V1.36.1). Seawater parameters, alpha diversity indices, and physicochemical parameters of each type of land use were subjected to an analysis of variance (ANOVA) using Tukey's test. Spearman rank correlation was used to analyze the effect of seawater physicochemical parameters on the bacterial communities. ANOVA and Spearman rank correlations