Protein expression
The coding region of the target gene encoding the transcriptional regulator YipR (XC_2801) from X. campestris pv. campestris, was sub-cloned into the vector pMAL-c5x, which enables the expression of a protein fused with both 6xHis tag (C-terminal) and the maltose binding protein (MBP) tag (N-terminal). The N-terminal MBP domain improves the solubility of the expressed proteins and the His-tag allows for standard large-scale protein purification by Ni 2+ − affinity chromatography using an automated system.
A 1-ml overnight culture was used to inoculate 50 ml of fresh LB medium in a 250-ml culture flask supplemented with 50 μg/ml ampicillin. This flask was incubated with shaking (200 rpm) at 37 °C overnight (~ 16 h). A 20 ml of the overnight culture was used to inoculate 1 L of fresh LB medium in a 2.5 L culture flask supplemented with 50 μg/ml ampicillin and incubated with shaking (200 rpm) at 37 °C until the culture attains OD600 = 0.4–0.6 (~ 3 h). Expression was induced by adding 60 μl of 0.5 M IPTG to a final concentration of 0.3 mM IPTG. Shaking is continued at 18 °C overnight (~ 16 h). Cells were harvested by centrifugation at 4000 rpm, 4 °C for 30 min and the supernatant discarded. These samples can be stored indefinitely at − 80 °C or used directly for protein purification.
Protein purification by affinity chromatography
The cell culture pellets were re-suspended with 50 ml lysis buffer (100 mM Tris-HCl [pH 8], 20 mM, Imidazole, 500 mM NaCl, 1 mM TCEP-HCl (Tris(2-carboxyethyl)phosphine hydrochloride), 2% (V/V) Glycerol), supplemented with 1 ml lysozyme (50 mg/ml), 50 μl DNase I (5 mg/ml) and one tablet of protease inhibitor. Bacterial cells were lysed with a microfluidizer or French Press at ~ 20,000 psi. Lysis was considered complete when the cloudy cell suspension becomes translucent. The lysate was centrifuged for 30 min at 16,000 rpm at 4 °C. Soluble protein (supernatant) was removed into a fresh 50 ml centrifuge tube. The supernatant was then filtered through a 0.22 μm filter and kept on ice. Affinity chromatography purification was performed using a HisTrap™ FF column (5 ml) in the ÄKTA protein purification system. The column was washed with Wash buffer 1 (100 mM Tris-HCl [pH 8], 20 mM Imidazole, 2 M NaCl, 2% Glycerol, 1 mM TCEP-HCl, 0.1. mM AEBSF (4-(2-Aminoethyl)benzenesulfonyl fluoride hydrochloride)) to remove nonspecifically bound DNA. Then the column was washed using Wash buffer 2 (100 mM Tris-HCl [pH 8], 20 mM Imidazole, 50 mM NaCl, 2% Glycerol, 1 mM TCEP-HCl, 0.1 mM AEBSF). Elution was carried out with Elution buffer 1 (100 mM Tris-HCl [pH 8], 500 mM Imidazole, 500 mM NaCl, 2% Glycerol, 1 mM TCEP-HCl, 0.1 mM AEBSF) using a linear gradient with a set target concentration of Elution buffer 1 of 50%. Protein-containing fractions were run on a 12% polyacrylamide gel. Visualization of protein bands was achieved by incubating the gel with InstantBlue stain for 5–10 min and the protein-containing fractions pooled. The protein sample was stored at 4 °C.
Protein purification by size exclusion chromatography
The protein sample was transferred into 20 ml ultrafiltration spin column (10,000 MWCO) and centrifuged at 4000 rpm at 4 °C until the final volume reached approximately 5 ml. Size exclusion chromatography purification was performed using HiLoad 16/600 Superdex 75 prep grade column with ÄKTA protein purification system using Binding buffer A (20 mM Tris-HCl [pH 8], 50 mM KCl, 2% Glycerol, 1 mM TCEP-HCl, 1 mM EDTA). Protein-containing fractions were run on a 12% polyacrylamide gel. Visualization of protein bands was achieved by incubating the gel with Instant blue stain for 5–10 min. Protein-containing fractions were pooled to together and concentration determined using a protein assay kit (BioRad DC protein assay kit).
Bind-n-seq: barcodes assignment and equilibration reactions
Barcodes were assigned to each testing condition as shown in Additional file 4: Table S3. Primer extension PCR master mix was generated by added randomized oligos for 15 reactions (25 μl/rxn): 52.5 μl of H2O, 15 μl of 10 μM Primer 1 (Additional file 5: Table S4), 187.5 μl of Taq DNA polymerase master mix (2×). A volume of 17 μl of the master mix was added into each PCR tube or well of a PCR microplate. 8 μl of 10 μM Bind-n-seq 93 mer (Additional file 5: Table S4) was added to each PCR reaction. PCR was run on a thermal cycler and using the following PCR program: [95 °C for 2 min] × 1, [63 °C for 1 min] × 1, [72 °C for 4 min] × 1, and store at 4 °C.
Bind-n-seq: binding reactions
For binding reaction, 20 × Binding buffer A (without KCl) was prepared as follows: 400 mM Tris-HCl, 20 mM TCEP-HCl, 40% Glycerol, 20 mM EDTA, and H2O to bring up the final volume to 100 ml. A master mix of Binding buffer B was prepared as follows for 12 reactions: A volume of 30 μl of 20 × Binding Buffer A (without KCl), 6 μl of 1 M MgCl2, 60 μl of 10% BSA and 24 μl of H2O. The KCl salt solutions were prepared as shown in Additional file 6: Table S5. Highly purified proteins were diluted to a concentration of 40 μM in Binding buffer A. A volume of 10 μl Binding buffer B was added to the Oligo mixture (25 μl) described above. Then protein (5 μl) and salt solution (10 μl) were added to the reaction tubes as shown in Additional file 7: Table S6 to make a total volume of 50 μl. The reaction tubes were incubated at room temperature for 2 h.
Bind-n-seq: enrichment reactions
Bind-n-seq wash buffers were prepared using different concentrations, as described in Additional file 8: Table S7. A 1.5 ml sterile microcentrifuge tube containing each binding reaction condition was prepared. A volume of 100 μl of the amylose resin slurry (≈ 50 μl packed resin after spinning down) was added to each microcentrifuge tube, and then centrifuged for 1 min at 14,000 rpm at room temperature. The supernatant was carefully removed without disturbing the resin. A volume of 1 ml H2O was added to the amylose resin and vortexed for 30 s. These H2O washes were repeated three times. Then a volume of 1 ml Bind-n-seq wash buffer (Additional file 8: Table S7) with specific KCl concentration to the corresponding tubes to equilibrate the resin was added. The tube was centrifuged for 1 min at 14,000 rpm at room temperature. The supernatant was carefully removed without disturbing the resin. This wash was repeated using Bind-n-seq wash buffer. A volume of 50 μl protein-DNA reaction was added to the equilibrated resin and incubated at room temperature for 30 min (the solution was gently mixed every 10 min). The tubes were centrifuged for 1 min at 14,000 rpm at room temperature and the supernatant was removed without disturbing the resin. Again, a 1 ml volume of Bind-n-seq wash buffer with specific KCl concentration was added to the corresponding tubes to remove the unbound nucleotides. These tubes were included for 10 min at room temperature and then centrifuged at 14,000 rpm at room temperature for 1 min. The wash step was repeated twice with Bind-n-seq wash buffer. After the washed a volume of 50 μl Bind-n-seq elution buffer was added (10 mM maltose in 1 ml EB buffer (QIAquick PCR purification kit, Qiagen)) to the reaction tubes to elute bound nucleotides and incubated for 10 min at room temperature. After incubation, the tubes were centrifuged for 1 min at 14,000 rpm at room temperature. The supernatant was transferred to a new microcentrifuge tube and stored at − 20 °C for up to 2 weeks (or used immediately for library amplification).
Bind-n-seq: library amplification
The qPCR master mix was created for 15 reactions to assess enrichment of recovered DNA (20 μl per reaction): 120 μl of H2O, 15 μl of Primer 2&3 (10 μM) (Additional file 5: Table S4), 150 μl of qPCR master mix (2×). A volume of 19 μl of the master mix was added into each PCR tube. One μl of enriched DNA was added to each PCR tube. PCR tubes were loaded into the real-time thermal cycler and run on the following PCR program: [95 °C for 5 min] × 1, [63 °C for 5 s, 72 °C for 10 s] × 39, melting curve at 50–90 °C for 5 s per degree. Reactions were analysed for the number of cycles required to achieve a saturated fluorescence signal. This number of cycles was then recorded and used as a guide for subsequent touchdown PCR amplification reactions to prepare sufficient DNA for Illumina sequencing.
A master mix was created to generate the sequencing library for 15 reactions as follows: (50 μl per reaction): 300 μl of H2O, 37.5 μl of 10 μM Primer 2 & 3 (Additional file 5: Table S4), 375 μl of Taq DNA polymerase master mix (2×). A volume of 47.5 μl of the master mix plus a volume of 2.5 μl of enriched DNA was added into each PCR tube. These tubes were moved to the thermocycler and the following PCR program used: [95 °C for 4 min] × 1, [95 °C for 30 s, 60 °C down 0.5 °C per cycle at 10 s, 72 °C for 4 min] × 10, [95 °C for 30 s, 45 °C for 30 s, 72 °C for 4 min] × 9, and stored at 4 °C. The PCR products were purified using the QIAquick PCR purification kit (Qiagen). The recovered DNA was quantified by Qubit dsDNA high sensitivity assay kit (Life Technologies). One hundred ng DNA from each enrichment reaction was pooled into one 1.5 ml-microcentrifuge tube and the total volume to was reduced to approximately 50 μl with a vacuum concentrator.
Bind-n-seq: sequencing
The resulting pooled library was diluted to 2 nM with NaOH and 10 μL transferred into 990 μL Hybridization Buffer (HT1) (Illumina) to give a final concentration of 20 pM. A volume of 600 μl of the diluted library pool was spiked with 10% PhiX control v3 and placed on ice before loading into the Illumina MiSeq cartridge following the manufacturer’s instructions. The MiSeq Reagent Kit v3 (150 cycles) sequencing chemistry was utilised with run metrics of 150 cycles for each single end read using MiSeq Control Software 2.4.1.3 and Real-Time Analysis (RTA) 1.18.54.
Data analysis
For data analysis, a new directory was created on the computer hard disk and used as working directory for the downstream analysis. The input sequencing file containing high quality sequences was placed into this directory (Note: that the input dataset should be in a compressed fastq.gz format). Other required files were downloaded from website:
https://anshiqi19840918.wixsite.com/ngsfilelinks/others and files saved to the same location as the sequencing file: background.txt (random 21mers that acts as the default background for a MERMADE run), Bind-n-seq 13-barcodes.csv (a comma-separated list of the possible 3 long bar-codes), which can be edited in excel to add meaningful names for specific libraries against the barcodes.
Installation of MERMADE
The original MERMADE package was Dockerized, which can be run on diverse operation systems, including Windows. More information can be found at https://anshiqi19840918.wixsite.com/ngsfilelinks/others (for commands for running on macOS system please see Additional file 9: First, the latest version of the Docker Desktop for Windows was downloaded and installed following the instructions in https://hub.docker.com/editions/community/docker-ce-desktop-windows. In the terminal window switch directory with command cd directoryname. To pull and install the Dockerized MERMADE image by using following commands in a terminal window:
docker pull pfcarrier/docker_mermade
Then following commands were used for development of the container:
docker run -v “directory path of the container”:/work -it pfcarrier/docker_mermade bash
The prompt in the terminal window should change to: /work#, which indicates the software has been installed successfully.
Sequencing data analysis using MERMADE
In the working directory, MEMADE could be run with the command
rm -rf databasename.db wdir;run_mermade.pl -o databasename.db -d wdir -b background.txt -v TGATCGGAAG sequencing.fastq.gz barcode.csv
where databasename is the name of the database file; sequencing.fastq.gz is the name of the sequence file; barcode.csv is the name of the edited barcode.csv file with user library names (Note there are other optional parameters that can be further optimized by the user, but in general running the application with default setting is recommended).
An analysis report was generated by using reporter.pl script. The reporter.pl script. Was executable with command:
reporter.pl < databasename.db > <# of motifs > <output dir > <barcodes>
Filtering and processing the results from MERMADE
Results from the MERMADE were processed by filtering low complexity patterns and those seed sequences with an enrichment below 2.5-fold over background and foreground reads less than 500. We applied an R script to select the final list of sequences that were submitted to the Regulatory Sequence Analysis Tools prokaryotes (RSAT). This script used the “.html” output generated by MERMADE and then identified 1) all the unique motifs; 2) shorter unique motifs that might be contained in longer ones; and 3) longer unique motifs (Please note that there are other software/applications available to search given motifs). RStudio can be downloaded and installed from: https://www.rstudio.com/ and ExtractMotifs zip file can be downloaded from https://anshiqi19840918.wixsite.com/ngsfilelinks/others. These files were unzipped and saved to the computer hard disk. A .txt file containing barcodes of interest was used (Please note the format of the file should be one barcode per line). RStudio was installed and packages loaded with the commands:
install.packages(“plyr”)
library(“plyr”)
install.packages(“dplyr”)
library(“dplyr”)
install.packages(“stringi”)
library(“stringi”)
install.packages(“htmltab”)
library(“htmltab”)
install.packages(“stringr”)
library(“stringr”)
install.packages(“devtools”)
library(“devtools”)
source(“
https://bioconductor.org/biocLite.R
“)
biocLite(“Biostrings”)
source(“
https://bioconductor.org/biocLite.R
“)
biocLite(“DECIPHER”)
Install and run ExtractMotifs package with commands:
install.packages(“PathTo/ExtractMotifs_0.1.0.tar.gz”,repos = NULL, type = “source”)
library(“ExtractMotifs”)
x < −ExtractMotifs(“path_to_html_file”,Ratio_Threshold,Foreground,"path_to_Barcode_List”)
The output from this command was three “.csv” files that were saved into the current R working directory and one HTML file that automatically open when the analysis was completed (Please note it was important to check the current active directory using the command getwd(). The list named BC_selected_Longest_Seqs.csv was used for genome-scale DNA pattern searching using Regulatory Sequence Analysis Tools (RSAT) Prokaryotes. RSAT Prokaryotes genome-scale DNA-pattern search is available at: http://embnet.ccg.unam.mx/rsat/genome-scale-dna-pattern_form.cgi. In this case, the selected organism of interest to identify pattern(s) as Query pattern(s) to perform the search was X. campestris pv. campestris sequenced strain 8004 (Please note the parameters at RAST-genome-scale DNA-pattern can be optimised for more specific searches if required. For example, the search region can be narrowed down within 200 bp upstream of annotated ORFs and also the researcher can disable the option of allow overlap with upstream ORF).