Rapid identification and mapping of insertion sequences in Escherichia coli genomes using vectorette PCR

Background Insertion sequences (IS) are small DNA segments capable of transposing within and between prokaryotic genomes, often causing insertional mutations and chromosomal rearrangements. Although several methods are available for locating ISs in microbial genomes, they are either labor-intensive or inefficient. Here, we use vectorette PCR to identify and map the genomic positions of the eight insertion sequences (IS1, 2, 3, 4, 5, 30, 150, and 186) found in E. coli strain CGSC6300, a close relative of MG1655 whose genome has been sequenced. Results Genomic DNA from strain CGSC6300 was digested with a four-base cutter Rsa I and the resulting restriction fragments ligated onto vectorette units. Using IS-specific primers directed outward from the extreme ends of each IS and a vectorette primer, flanking DNA fragments were amplified from all but one of the 37 IS elements identified in the genomic sequence of MG1655. Purification and sequencing of the PCR products confirmed that they are IS-associated flanking DNA fragments corresponding to the known IS locations in the MG1655 genome. Seven additional insertions were found in strain CGSC6300 indicating that very closely related isolates of the same laboratory strain (the K12 isolate) may differ in their IS complement. Two other E. coli K12 derivatives, TD2 and TD10, were also analyzed by vectorette PCR. They share 36 of the MG1655 IS sites as well as having 16 and 18 additional insertions, respectively. Conclusion This study shows that vectorette PCR is a swift, efficient, reliable method for typing microbial strains and identifying and mapping IS insertion sites present in microbial genomes. Unlike Southern hybridization and inverse PCR, our approach involves only one genomic digest and one ligation step. Vectorette PCR is then used to simultaneously amplify all IS elements of a given type, making it a rapid and sensitive means to survey IS elements in genomes. The ability to rapidly identify the IS complements of microbial genomes should facilitate subtyping closely related pathogens during disease outbreaks.


Background
Insertion sequences (IS) are small DNA segments capable of transposing within and between prokaryotic genomes and episomes, often causing insertional mutations and chromosomal rearrangements [1].Identifying and map-ping IS elements in microbial genomes is essential to understand their evolutionary significance [2][3][4][5].So rapidly can IS elements move that even closely related laboratory strains commonly differ in the positions of their IS sequences [6,7].A swift means to identify IS insertions might therefore allow isolates from specific disease outbreaks to be distinguished from other closely related strains.
Several methods have been used to identify the number and locations of IS elements in bacterial genomes, including Southern hybridization [3] and the inverse polymerase chain reaction (iPCR) [4,8,9].Southern hybridization is rather time-consuming and requires additional procedures for localizing ISs.Inverse PCR, a commonly used PCR method for recovering unknown flanking sequences of a known target sequence, uses a library of circularized chromosomal DNA fragments as template and two outward primers located in each end of the known fragment for amplification [8].However, when a target sequence has multiple genomic locations, the variously sized DNA circles formed are difficult to amplify simultaneously.Also, the length of each restriction DNA fragment containing a target sequence must be determined by Southern hybridization followed by sub-genomic fractioning before intramolecular ligation and PCR amplification [4,8,9].These difficulties render Southern hybridization and iPCR impractical as techniques for quickly surveying repetitive elements in genomes.
Vectorette PCR (vPCR) [10,11] is another method used to amplify unknown sequences flanking a characterized DNA fragment.It involves cutting genomic DNAs with a restriction enzyme, ligating vectorettes to the ends, and amplifying the flanking sequences of a known sequence using primers derived from the known sequence along with a vectorette primer (Fig. 1).This technique has found many applications, including sequencing cosmid insert termini [10], identifying telomeres [12] and microsatellite sequences [13], mapping deletions, insertions, and translocations [14,15], and determining the 5' and 3' ends of mRNAs [16].Here, we explore the efficiency of vPCR with regards to identifying and mapping IS elements in microbial genomes.We show that multiple copies of an IS are readily amplified using an IS specific primer in combination with a vectorette primer, and that their genomic locations are readily identified from the flanking DNA sequences.

Results and discussion
The IS insertions of CGSC6300 We used E. coli strain CGSC6300, a close relative of the sequenced strain MG1655, against which to test the efficiency and reliability of vPCR in detecting IS copies.IS insertion sites were identified by sequencing flanking DNA fragments amplified using outward IS-specific primers in combination with the vectorette primer.Based on the whole genome sequence of strain MG1655 [17], there are 37 ISelements, including 7 copies of IS1, 6 copies of IS2, 5 copies of IS3, 1 copy of IS4, 11 copies of IS5, 3 cop-ies of IS30, 1 copy of IS150, and 3 copies of IS186.Our results for each IS in CGSC6300 are summarized in Table 1 and described as follows:

IS1
Eight and 6 PCR bands, obtained with primers IS1-A and IS1-B respectively, were observed on ethidium bromidestained agarose gels (Fig. 2).All 7 IS1 insertion sites in the sequenced genome of MG1655 [17] were successfully identified by isolating and sequencing these fragments.Sequences obtained from both flanking sequences were used to locate 2 IS1 elements (IS1-5 and IS1-6).The remaining 5 IS1 locations were identified from single flanking sequences.Three additional IS1 elements (IS1-a in b0240, IS1-c in b1786, and IS1-f in b2635) were also found in CGSC6300.

IS4
One IS4 was located based on flanking sequences amplified from both sides.No additional IS4 insertions were found.

IS30
The three known IS30 insertions in MG1655/CGSC6300 were identified based on flanking sequences amplified from both sides, and an additional insertion was identified in b2156.

IS150
The one known IS150 insertion was identified and no other.

IS186
The three known IS186 insertions were identified based on flanking sequences amplified from both sides (Fig. 2).

Additional IS copies in laboratory strains
Several IS elements located in CGSC6300 are not found in the genomic sequence of MG1655 (Table 1).Lyophylized CGSC6300 was obtained from the E. coli Genetic Stock Center, Yale University, and is stored at our laboratory in 15% glycerol at -80°C.It seems likely that the additional IS transpositions arose after separation from the sequenced MG1655, but prior to arrival in our laboratory, probably during storage on agar slants at room temperature, a condition known to promote IS mobilization [6,7].These results emphasize that the IS complement of each strain should be characterized prior to experimentation.
Two other E. coli K12 derivatives, TD2 and TD10, contain 16 and 18 additional IS insertions (Table 1), respectively.The two additional insertions found in TD10 are: the IS3a insert between b0314 and b0315 and the IS1-b insert associated with a deletion between b0319 and b0326.Originally, TD2 and TD10 were constructed by P1 transduction of different lac operons into the ∆lac of K12 Vectorette PCR for amplification of IS franking sequences  derivative strain DD320 [18].The IS insertion differences between these two strains probably arose when sequences flanking the lac operon were cotransduced during strain construction.

Reliability of technique
Theoretically, the number of flanking DNA fragments amplified with each IS-specific primer should equal the number of copies of each IS element in the genome.Also, the location of each IS copy should be identifiable from the two flanking DNA sequences.However, some copies of IS elements 1, 2, 3 and 5 were initially located by a single flanking sequence only.DNA fragments not recovered may have been masked by fragments of similar size, amplified from other genomic copies of the IS element.This is evidenced by bands in ethidium-stained agarose gels appearing broader and/or staining more intensely (see Fig. 2).While these bands produce clearly readable sequence in the ISs themselves, their flanking sequences are unreadable or show high noisy background, indicating the presence of multiple fragments of similar size (data not shown).In the case where flanking sequences were readable, we located one of the fragments -presumably the one that was amplified most efficiently.
Despite missing fragments, vectorette PCR provides a reliable estimate of the copy number of elements in a genome.Let the number of copies of the i th IS element be n i , and the number of unidentified flanking sequences be u i .Then the probability that an IS copy is not identified is simply a product of the probabilities of not obtaining either the A-side or the B-side sequences, The expected number, x, of missing copies is determined by summing over all n i copies of each of the j = 8 elements in MG1655.Our data provide an estimate of expected missing copies.In fact, only 1 copy was missed entirely.Even when digested by just a single four-cutter restriction enzyme, vectorette methodology is highly reliable with small error rates: 6.8% expected and 2.7% realized.
The actual error rates are even smaller.Our analysis is restricted to the 37 ISs found in the genomic sequence of MG1655; the 7 additional ISs in CGSC6300 were not used in the calculations even though they may serve to mask fragments and thereby increase the expected and observed error rates.
To determine the reliability of the technique when there are many more than 11 copies of an IS element in a genome requires estimating m, the maximum number of amplified fragments likely to be resolved per lane by agarose minigel electrophoresis.Only a small portion of the resolving power of an agarose gel is actually used because approximately 98% (approximately because the calculation assumes equal base frequencies) of amplified fragments produced by a 4-base cutter restriction enzyme are less than 1 kb (excluding the IS and the vectorette).Hence, m is less than the maximum number of fragments physically capable of being resolved by agarose minigel electrophoresis.
Consider m as the number of discrete positions that an amplified fragment might occupy.The probability that a particular position is not occupied given n i copies of an IS element i is .The expected number of unoccupied positions is and the expected number of occupied positions (i.e.bands visualized) is .Use f i as an estimate of the number of amplified fragments identified by sequencing.Nonlinear regression of fragments identified, , against the number of known genomic copies, n i , yields an estimate of m = 11.64 ± 1.79 (Fig. 3A).As a practical matter, no more than a dozen amplified fragments is ever likely to be resolved by agarose minigel electrophoresis when a four-cutter restriction enzyme is used to digest genomic DNA.
Summing the expectations for missing A-side and B-side fragments (i.e.amplified fragments not identified by sequencing) for the j = 8 species of IS elements in MG1655 yields  Estimation of IS flanking DNA likely to be resolved and missed Figure 3 Estimation of IS flanking DNA likely to be resolved and missed.A. The maximum number of fragments likely to be resolved, m, can be estimated by plotting the number of bands observed against the genomic copy number.Only a finite number of bands can be visualized on a gel.Consequently, the likelihood that two amplified fragments comigrate increases with the number of IS copies in the genome.B. The number of amplified flanking sequences likely to be missed rapidly increases when 10 or more bands are visualized.Genomic digests with a single restriction enzyme should be restricted to IS elements with fewer than 10 copies per genome.Genomes with more than 10 copies of an IS element should be screened using high resolution agarose gels and/or using a second restriction enzyme to allow all IS copies to be identified.which is slightly larger than the 17 known masked fragments from MG1655 (each marked with an asterisk in Table 1).The probability that an IS copy is not identified is , where the prime designates that this expectation is based on an ability to resolve a maximum of m = 12 fragments per lane.The expected number of missing IS copies is , which is only slightly larger than the direct estimate x = 2.54.We conclude that the model provides a robust fit.
A plot of against n i (Fig. 3B) reveals that the number of missing fragments increases rapidly with the number of genomic copies.With n i = 20 = 5 copies (25%,) remain undetected, and even with n i = 10, = 1 (10%) is expected to be overlooked.To avoid underestimating the number of copies of a highly repeated element, we recommend digesting genomic DNA with a different restriction enzyme and repeating vPCR and sequencing.By using another four-base cutter restriction enzyme Bst UI, we identified all flanking sequences not recovered with the enzyme Rsa I for IS1, IS2, IS3, and IS5, as showed in Fig. 4 for IS2.Larger, temperature controlled high resolution agarose gel electrophoresis apparatus available in some laboratories would also improve resolution of the technique.

Applications
It is apparent that IS complements differ among very closely related laboratory E. coli K12 derivatives MG1655, CGSC6300, TD2 and TD10.The rapidity with which these differences have evolved suggests that ISs may play important roles in experimental evolution.Indeed, adaptation by E. coli to novel laboratory environments is often characterized by IS element mobilization [4,[19][20][21][22].Using vPCR will provide these workers with a comprehensive view of genomic reorganization during laboratory evolution.Using this method, we characterized IS elements in 40 isolates which evolved from TD2 and TD10 during chemostats and found a number of IS-mediated gene deletions, duplications and transpositions (unpublished data).
Surveys of natural isolates of E. coli reveal that the numbers and locations of IS elements differ widely among closely related strains, suggesting a brisk turnover of IS elements within and among host lineages [6,[23][24][25].
Comparisons of E. coli genomic sequences confirm that IS elements are commonly associated with chromosomal rearrangements within lineages [17,26,27].The ability to rapidly and accurately determine the IS complement of the genomes of natural isolates is not only desirable from a population genetic standpoint, but vPCR might also facilitate rapid typing of epidemiological outbreaks of pathogens otherwise indistinguishable from related strains.In this regard it is worth noting that IS sequences are highly conserved compared with most E. coli housekeeping genes [28].This will greatly aid using vPCR to type strains because only 1 pair of primers is needed for each type of IS element.

Conclusions
This study shows that vPCR is a swift, efficient, reliable method for typing microbial strains and identifying and mapping IS insertion sites present in microbial genomes.Flanking DNA sequences from 36 of the 37IS elements in the E. coli strain MG1655 were recovered by vPCR and confirmed by DNA sequencing.Unlike Southern hybridization and iPCR, our approach involves only one genomic digest and one ligation step.Vectorette PCR is then used to simultaneously amplify all IS elements of a given type, making vPCR a rapid and sensitive means to survey IS elements in genomes.

Strains
Three derivatives of the K12 isolate were used in this study.Strain CGSC6300, obtained from E. coli genetic Stock Center, Yale University, was used as a control because it is closely related to MG1655 whose entire genome has been sequenced [17].TD2 and TD10 (deriv-atives of DD320, itself a K12 derivative) are routinely used in our experiments in molecular evolution [29].

DNA isolation
Genomic DNA was isolated from overnight culture in LB medium using DNAeasy DNA isolation kit (Qiagen, Valencia, CA, USA).

Vectorette unit
The vectorette unit was made using the protocol of Botstein lab http://genome-www.stanford.edu/group/botlab/protocols/vectorette.html [30].The two anchor bubble primers were synthesized by the Advanced Genetic Analysis Center at The University of Minnesota, St. Paul.To anneal bubble primers, 4 µM of each primer (in ddH 2 O) were combined in a total volume of 100 µl.The mixture was incubated at 65°C for 5 minutes, and then MgCl 2 was added to a final concentration of 1-2 mM before cooling down to room temperature.

DNA digestion and ligation of vectorette units
Genomic DNA from each strain was digested using the restriction enzyme Rsa I to produce small, blunt-ended fragments (Fig. 1).The enzyme is a four-base cutter and has 0 to 3 restriction sites within open reading frames (orf) of the eight insertion sequences (IS1, IS2, IS3, IS4, IS5, IS30, IS150, IS186), but does not cut at the extreme ends of each orf.This allows for the design of outward primers to amplify the IS flanking sequence for both sides (see below).Digestion was carried out at 37°C overnight in a 50 µl reaction containing 1 × NEBbuffer (No. 1), 0.5 µg DNA and 10 units of Rsa I.After digestion, 2 µl of anchor bubble unit, 1 µl of 10 mM ATP and 1 unit of T4 DNA ligase (New England Biolabs, Beverly, MA) were added and the reaction was incubated for 5 cycles at 20°C for one hour followed by 37°C for 30 min.

Primers and PCR amplification
Outward primers (Table 2) from each end of the 8 IS sequences were designed and used for PCR amplification in combination with the vectorette primer (5' CGAATCG-TAACCGTTCGTACGAGAATCGCT 3') (Fig. 1).The distance between an IS-specific primer position and the extreme end of the IS orf ranged from 16 to 184 bp, which facilitated identifying IS-associated PCR products from DNA sequences.PCR reactions were carried out using Qiagen Multiplex PCR kit (Qiagen, Valencia, CA, USA).Each reaction contained 1 × Qiagen Multiplex PCR Master Mix, 0.2 µM of outward IS primer and vectorette primer and 2 ng of DNA templates (Rsa I-digested DNA ligated with vectorettes).PCR cycling conditions were 95°C for 15 min, 35 cycles of 94°C for 30 s, 60°C for 90 s, 72°C for 2 min, and a final extension step at 72°C for 10 min.The amplified products were separated in 1.4% agarose gel, stained with ethidium bromide and visualized under UV light.DNA bands were excised and purified with Qiagen DNA Gel Extraction Kit (Qiagen, Valencia, CA, USA).

DNA sequencing and analysis
DNA sequencing analysis was carried out on both DNA strands by the AGAC, University of Minnesota, using an IS-specific primer and the vectorette primer.DNA sequences were subjected to BLAST searches against the MG1655 genome sequence.

Figure 1 Vectorette
PCR for amplification of IS franking sequences.The shadowed area represents the IS sequence.The solid lines indicate the flanking DNA sequences.∇ indicates the restriction site.A and B are the outward IS-specific primers located at the ends of the IS.V is a vectorette primer.primer A (B) and V Initial PCR with an outward IS primer (A or B) Digestion with restriction enzyme Ligation to vectorette units

1 1 PCR
amplification of IS flanking DNA from E. coli strains CGSC6300, TD2 and TD10 Figure2PCR amplification of IS flanking DNA from E. coli strains CGSC6300, TD2 and TD10.Results for IS1, 2, 3, and 5 and 186 are shown.Genomic DNA was digested with Rsa I, ligated with vectorette units and amplified by vPCR.Each panel shows the PCR products generated by two outward IS-specific primers (arrows) of an IS in combination with the vectorette primer.Flanking DNA fragments from both sides of each IS location were amplified.The PCR products were excised, purified, sequenced and identified from the genome sequence of E. coli strain MG1655[17].A PCR fragment flanking a known IS site in MG1655 is indicated by the element's name followed by an identifying numeral; for example, IS1-1 is one of 7 IS1 elements in the MG1655 genome.Additional flanking DNAs not found in MG1655 are labeled with the b# of the gene in which the IS is located.PCR products were separated in 1.4% agarose gels and stained with ethidium bromide.Intense bands in the 100 kb ladder correspond to 500 and 1000 bp.
PCR amplification of IS2 flanking DNA from genomic DNA digested with Bst UI Figure 4 PCR amplification of IS2 flanking DNA from genomic DNA digested with Bst UI.Flanking DNA fragments IS2-3A and IS2-6A (left hand side) and IS2-2B (right hand side), masked by other amplified fragments when genomic DNA was digested with Rsa I (see Fig. 2), were recovered with Bst UI.

Table 2 : Primers used for identification of ISs using vectorette PCR
[17]ers are named after the insertion sequence with A and B designating each side.bSequencesobtainedfromMG1655[17].Publish with Bio Med Central and every scientist can read your work free of charge"BioMed Central will be the most significant development for disseminating the results of biomedical researc h in our lifetime." a c Length of primer in base pairs.available free of charge to the entire biomedical community peer reviewed and published immediately upon acceptance cited in PubMed and archived on PubMed Central yours -you keep the copyright Submit your manuscript here: http://www.biomedcentral.com/info/publishing_adv.asp BioMedcentral BMC Microbiology 2004, 4:26 http://www.biomedcentral.com/1471-2180/4/26