Strain-specific differences in Neisseria gonorrhoeae associated with the phase variable gene repertoire

Background There are several differences associated with the behaviour of the four main experimental Neisseria gonorrhoeae strains, FA1090, FA19, MS11, and F62. Although there is data concerning the gene complements of these strains, the reasons for the behavioural differences are currently unknown. Phase variation is a mechanism that occurs commonly within the Neisseria spp. and leads to switching of genes ON and OFF. This mechanism may provide a means for strains to express different combinations of genes, and differences in the strain-specific repertoire of phase variable genes may underlie the strain differences. Results By genome comparison of the four publicly available neisserial genomes a revised list of 64 genes was created that have the potential to be phase variable in N. gonorrhoeae, excluding the opa and pilC genes. Amplification and sequencing of the repeat-containing regions of these genes allowed determination of the presence of the potentially unstable repeats and the ON/OFF expression state of these genes. 35 of the 64 genes show differences in the composition or length of the repeats, of which 28 are likely to be associated with phase variation. Two genes were expressed differentially between strains causing disseminated infection and uncomplicated gonorrhoea. Further study of one of these in a range of clinical isolates showed this association to be due to sample size and is not maintained in a larger sample. Conclusion The results provide us with more evidence as to which genes identified through comparative genomics are indeed phase variable. The study indicates that there are large differences between these four N. gonorrhoeae strains in terms of gene expression during in vitro growth. It does not, however, identify any clear patterns by which previously reported behavioural differences can be correlated with the phase variable gene repertoire.


Background
Neisseria gonorrhoeae is the causative agent of the sexually transmitted disease gonorrhoea. In the male this is typically associated with a purulent discharge from the urethra. However, in women, infection of the cervix is often asymptomatic while gonococcal spread up the urinary tract or invasion across the epithelial layers can cause additional complications such as pelvic inflammatory disease (PID) and disseminated gonococcal infection (DGI). Depending on geographical location, between 0.1%-3% of cases of uncomplicated gonorrhoeae (UG) disseminate and cause DGI [1][2][3]. The mechanisms that allow some strains to invade and cause DGI are not well understood.
Many attempts have been made to correlate disease phenotype with the genetic characteristics of isolates from DGI and UG. Certain phenotypes have been associated with DGI, including the arginine, hypoxanthine, and uracil (AHU) auxotype in which AHUstrains tend to be more invasive [4,5]. Serum resistance, and therefore the ability to invade successfully, is associated with porin serotype 1A, via its interaction with C4 binding protein [6] as well as a role in the invasion of epithelial cells [7]. There is also a correlation between the presence of a combination of atlA and the traG variant, sac-4, and the DGI phenotype. These are thought to be a serum resistance locus and a cytotoxin that are located within the Gonococcal Genetic Island [8]. Addressing this issue clearly is difficult because it is possible that isolates capable of causing disseminated infection will be isolated from uncomplicated infections, and the converse may occur in hosts that are unable to produce a satisfactory immunological defence.
There are four experimental strains of N. gonorrhoeae commonly used in the laboratory. Strain FA1090, the only gonococcal strain for which a genome sequence is currently available http://www.genome.ou.edu/gono.html, was isolated from the cervix of a patient with DGI. Strain FA19 was isolated from a patient with both UG and DGI. Strains F62 and MS11, both of which were isolated from patients with UG, are also frequently used. Several differences have been reported between these strains.
These include the ability to acquire iron from lactoferrin due to expression of lactoferrin binding protein (Lbp). A competitive advantage associated with this receptor in male urogenitary tract infection has been reported, however the absence of this gene from approximately half of gonococcal strains suggests that there may also be an advantage to its absence [9]. Other differences include an elevated ICAM-1 response generated in epithelial cell lines upon exposure to strain FA1090 as compared to strain MS11 [10]. One of the largest differences is the presence of the Gonococcal Genetic Island in strain MS11 [8] and strain FA19, but not the other two strains [11]. Microarray studies have revealed that there is little difference in the gene complements of these four strains. Indeed, these studies have shown that strain F62 contains all of the genes encoded by the FA1090 genome [11]. What comparative gene hybridization studies cannot tell us is whether there are small differences in the gene sequences, including frame-shifts or base substitutions, and whether there are differences in the phase variable gene repertoire.
Phase variation is a process employed by many bacteria to reversibly control gene expression. It is a switching mechanism that allows the gene to be reversibly turned ON or OFF. One difference between this and other regulatory mechanisms is that it does not occur in response to a stimulus but instead is mediated by changes to the DNA, which can occur during replication. There are many mechanisms for phase variation including inversions and transposon movement (for reviews see [12,13]), however, the most common mechanism within the Neisseria spp. is via a slippage mechanism in which the length of a simple sequence repeat changes during replication [14]. Within the coding sequence, changes in this repeat can cause frame-shifting and generation of truncated reading frames. Upstream of the coding sequences changes in repeat length can also alter the expression level of the genes by changing the efficiency with which RNA polymerase or other transcription factors bind to promoter components.
Due to the potential phase variation has to alter the expression profiles of a bacterium, this is one of the next logical areas to investigate to identify differences between the strains of N. gonorrhoeae that might account for differing behaviour. To achieve this a revised list of all of the potential phase variable genes, excluding opa and pilC genes, in N. gonorrhoeae was generated based upon the four publicly available neisserial genomes, Neisseria meningitidis serogroup A strain Z2491 [15], serogroup B strain MC58 [16], serogroup C strain FAM18 (available at http:/ /www.sanger.ac.uk/Projects/N_meningitidis/seroC/ seroC.shtml), and N. gonorrhoeae strain FA1090 (available at http://www.genome.ou.edu/gono.html). The presence of the potentially variable repeat and the expression status of each of these genes was then determined and compared between the four N. gonorrhoeae strains.

A revised list of potential phase variable genes in Neisseria gonorrhoeae
Potential phase variable genes were predicted through identification of simple sequence repeats within coding or promoter regions, the alteration in the length of which would alter the expression of the associated protein.
Genes that are common to N. gonorrhoeae and different strains of N. meningitidis tend to be highly conserved, with typical sequence identities of greater than 90%. There is sufficient similarity between them for differences in the length of repeats, with the potential to disrupt reading frames, to increase the quality of phase variable gene predictions. In the Neisseria spp. the phase variable repertoire has been previously defined through three-way genome comparison to identify potential phase variation associated simple sequence repeats [17]. Based on the repeat tract length, the presence of the gene, the presence of variations in the repeat tract length, and the location of the repeat, each of the previously identified potential phase variable genes was scored for the probability of being phase variable. Using the additional genome sequence of N. meningitidis strain FAM18, the predicted phase variable gene repertoire was re-assessed through four-way genome sequence comparison (data not presented), using the same method as previously described [17]. In this way, those potential phase variable genes that are common between the four genome sequences could be evaluated in light of variation in the presence and length of the repeat tract associated with the genes. This analysis identified 64 repeats in N. gonorrhoeae that had the potential to mediate phase variation (Table 1), excluding the opa and pilC genes. These were investigated in N. gonorrhoeae strains FA1090, F62, FA19, and MS11. The opa and pilC genes were not included in this study because these genes are established as phase variable, to change very rapidly during culture, and do not have sufficiently distinct priming sites in the multiple alleles for amplification. Therefore analysis of these genes would not be possible, and it would not be possible to interpret the meaning of any observed changes.

Repeat length changes
Repeat length and composition were compared between the four gonococcal strains and differences were observed in 35 of the 64 (55%) repeats studied (Table 2), although no sequence was obtained for the repeat in XNG1511. The variation can be seen in two forms; either as variation of the repeat length, or sequence composition, between the four strains of N. gonorrhoeae as determined on the basis of the predominant population following sequencing, for example XNG0412 in which strain FA1090 contains a (G)7 repeat tract while strain FA19 contains a (G)6 repeat tract. In addition, although PCR can generate variation in repeat numbers leading to mixed populations in sequencing for templates of homopolymeric tracts of (C or G)11 or greater, when variation is seen in shorter homopolymeric tracts, or in repeats composed of longer repeated motifs, this can be taken to indicate that variation has occurred in vivo during culture. This has been established in a previous study in H. pylori using a similar amplification and sequencing strategy [18]. Observed changes in these repeat tracts provide additional evidence of phase variation, and is indicated by 'plate' in Table 2.
For example, XNG0080 in strain FA19 is identified as having a (C)10 tract in Table 2, but this is the predominant population and in fact the population contains (C)8-(C)10 tracts in this gene. 12 of the 36 variable repeats show this form of instability during culture.
The threshold at which variation is observed in repeat length for N. gonorrhoeae appears to be in close agreement with that used to identify genes that had the potential to be phase variable in Neisseria spp. Variation is seen in re peats of poly-G/C tracts ≥ 7 nucleotides (nt), and poly-A/ T tracts of ≥ 9 nt, while tetramers, pentamers, and heptamers show variation between strains even at low copy numbers of the repeat (differences between strains are observed in most of these repeats with only 2-4 copies of the repeat present e.g. the (CCCAA) repeat in XNG0520). There were no differences observed in either of the dinucleotide repeat tracts. G/C repeats ≥ 9 nt, tetramers, and pentamers (of all observed sizes) are referred to as variable copy number repeats subsequently because they tend to be unstable during in vitro culture. Therefore, when a variable copy number repeat shows no variation in the sequence data the associated ON or OFF phenotype is likely to be adaptive for the in vitro culture conditions. For (TTCC)3 -L Other plsX fatty acid-phospholipid synthesis protein † CDS designation numbers come from our own annotation of the N. gonorrhoeae strain FA1090 genome sequence. ‡ The known or potential phase variable repeat tract from N. gonorrhoeae strain FA1090. The equivalent sequence is identified for genes without a repeat tract that are phase variable candidates from the meningococci. Following the sequence a -or + signifies if the CDS is in-frame or frameshifted, respectively. Repeats in promoter or predicted promoter locations are indicated with a p. Degenerate indicates a gene with multiple frameshifts or in-frame termination conditions. * The predicted likelihood of phase variability. K indicates a known phase variable gene; S indicates a strong candidate for phase variation; M indicates a moderate candidate; L indicates a low candidate. See text for details of these thresholds.
† † XNG0470 is a homologue of XNG1014, and XNG1513 which have a (G)6 and are not frame-shifted. ** XNG1207a, XNG1577, and XNG1788 are homologues of each other.   Of the repeats previously experimentally determined to be phase variable (candidacy of 'K' in Table 1), nine of the ten genes show variation in the repeat including fetA (XNG1989) and lgtA (XNG2045). Variation in the hmbR repeat is not seen as the tract is interrupted by base substitutions and indeed the gene is degenerate (i.e. contains multiple frameshifts and/or in frame termination codons) in all four strains. hmbR was previously reported to be degenerate in N. gonorrhoeae strain MS11 [19]. In some genes predicted from sequence analysis (including many encoding hypothetical proteins), but not previously demonstrated to be phase variable, differences in repeat length provide new evidence for the phase variability of these genes. Furthermore, in some repeats mixed populations are observed in the sequence data indicating that these genes are indeed being phase varied during in vitro culture e.g. pglH (XNG0080). In Table 2 these are marked as 'plate' in the column marked 'Differences in Repeat'.

Repeat length stability in N. gonorrhoeae strain FA1090
The FA1090 strain used in this study has a separate passage history than that used for the genome sequence, yet the copy numbers of the repeats are highly consistent between those observed in this study and the genome sequence. The predominant populations of the variable copy number repeats are mostly the same as the genome sequence. Some variation in the copy number of tetramers and pentamers is observed, but these tend to be more variable in repeat length amongst all strains. There is only one low copy number repeat that differs in length com- 'Genome' indicates that there is variation from the genome sequence but no observed differences between strains. † This gene is not present in this strain as seen by microarray comparative genome hybridisation [11]. ‡ ‡ The site-specific recombinase piv-Nm1B has been inserted into this gene in all four strains † † A large deletion is present in the N-terminus of this gene. ** Amplification and sequencing did not work. *** Sequence analysis showed that the ORF present here had no homology to XNG0412 from strain FA1090. pared to the genome sequence. This is in the coding sequence for methionine aminopeptidase (XNG1868) and the repeat is indicated as being (G)7 in the genome sequence but for all strains in this study it was observed as (G)6, which is unlikely to be phase variable. Variation in expression of this gene is probably not favourable for cell survival, as it codes for an important protein involved in removing the N-terminal methionine from nascent peptides. It is thought therefore that this variation is probably due to an error in the genome sequence rather than phase variation, and is consistent with independent sequencing that observed a (C)6 repeat in strain FA1090 (L Snyder, unpublished).

Repeat length differences between the unrelated strains
Amongst the four strains, there are differences in copy number or composition of the repeat in 55% of the genes studied. Not all of these differences are necessarily associated with changes in expression. Table 2 shows the repeat length and expression status of the gene and it can be seen that some repeats are variable but still maintain an OFF phenotype e.g. the type I restriction enzyme S (XNG0388). Other repeats are associated with genes that are in fact degenerate, e.g. hmbR (XNG1216), aspA (XNG0986), and porA (XNG0832), an association that had been previously described [14]. Not all of the observed strain differences are due to repeat variation as in genes that are irreversibly switched ON or OFF. For lbpA (XNG0245), a gene that is highly variable in N. meningitidis, there is no evidence of phase variation in the gonococcus and what is seen instead is expression or absence due to a deletion in the N-terminus of the gene in both strains FA1090 and F62. In approximately 45% of isolates this deletion prevents expression of the Lbp receptor. It is proposed that there may be a selective advantage for nonexpression of this gene over strains with the ability to extract iron from lactoferrin [9].
The potential expression state of each gene was then examined with respect to variation in the repeat length and frameshifts in the gene. These core experimental strains have been passaged repeatedly under in vitro laboratory conditions. Therefore, the expression state of the phase variable repertoire in vitro may not reflect the phenotypes expressed during infection. The expression state of strain FA1090 has, however, shown considerable stability through different passage histories, and the phenotypes in the other strains may also be similarly stable and different. Indeed, theoretical models indicate that in the absence of selection that favours more fit phenotypes, phase varied phenotypes and their associated repeats will tend to be stable [20]. There will still be random changes between equally fit alternatives, but the net rate of change within the population will be substantially slower than the changes that occur in the presence of selective conditions. Observed differences may well account for at least some of the reported differences in strain behaviour, even if the phase variable repertoires are similar.
The genes that contain repeats associated with differences between the strains are not evenly distributed among the different functional groups of genes investigated, with the genes for surface sugar biosynthesis (lipopolysaccharide (LPS) and pilin glycosylation) and restriction modification systems being the most variable in expression and repeat copy number.
It is known that there are four phase variable genes involved in LPS biosynthesis in N. gonorrhoeae, lgtA, lgtC, lgtD, and lgtG, and repeat length variation is observed in all these genes in this study. When these genes switch they can produce a diverse range of surface structures even within a single strain. In vivo certain LPS structures have been shown to provide a mechanism for serum resistance because they can bind CMP-NANA and this allows resistance to the antibody-mediated arm of the complement cascade [21].
The pilin glycosylation enzymes attach sugars to the pilus of the Neisseria spp. and the four studied here all show evidence of phase variation, with in vitro instability. Phase variation has previously been seen in pgtA/pglA (XNG1644), pglE (XNG0189), pglG (XNG0081) and pglH (XNG0080) [22][23][24]. The role of glycosylation is unclear, but as the pilus is a surface-exposed structure glycosylation may have a role in masking immunogenic parts of the pilus. Indeed it has been reported that glycosylation may inhibit complement-mediated lysis due to its ability to bind anti-gal antibodies [25]. It has been seen that galactosidase treatment of pili markedly reduces attachment of the pilus to host cells suggesting an importance of the galactose residues for attachment [26]. The role of phase variation in the genes that produce the saccharide is unclear, although it is possible that different sugar forms may be associated with different properties with respect to attachment and colonisation.
N. meningitidis contains four uptake systems for iron that are phase variable. These are the lactoferrin uptake system LbpAB, the haemoglobin uptake systems HmbR and HpuAB, and the siderophore receptor FetA. In the N. gonorrhoeae strain FA1090 genome sequence hmbR is degenerate due to a premature frameshift, and this study shows that the hmbR gene is not intact in any of the four gonococcal strains. The lbpA gene contains a deletion in strains FA1090 and F62, but is present and intact in strains FA19 and MS11 but is not phase variable. fetA and hpuA are both present and contain phase variable tracts that exhibit variation during in vitro culture.

Genes with unassigned functions
13 of the 23 hypothetical genes show differences in their repeat lengths or composition. XNG0663 is a hypothetical protein that has limited homology to Curli-like repeat regions. These Curli-like repeats are found in Escherichia coli and are associated with adhesion to many different host cell proteins including MHC class I [27], and fibronectin [28]. They are also seen in Salmonella enterica [29] and Porphyromonas gingivalis. XNG0663 does not contain the same variable repeat as the homologous gene in the meningococcus as the poly-C repeat is interrupted by a T residue. It does, however, have a poly-T tract in which T(7) is associated with expression; but it is unlikely that this would be phase variable due to its length. The gene therefore appears to be fixed ON in strains F62 and MS11 and OFF in strains FA1090 and FA19.
Due to its apparent association with strains that cause DGI rather than UG the expression status of XNG0663 was determined in several clinical isolates of N. gonorrhoeae. In a study of 20 strains isolated from different types of infection it is apparent that the gene is present and turned ON in the majority of strains and that the suggested association with DGI is probably due to the initial small sample size. These results are shown in table 3.

Conclusion
This study extends the number of N. gonorrhoeae phase variable genes for which there is direct or indirect evidence of repeat length changes from ten to 28, in addition to the pilC and opa genes. Based upon the four most commonly used experimental strains no clear association between the presence of phase variable versions of genes or their phenotypes and invasive potential could be identified, other than the previously described association with pgtA [30]. An apparent association with XNG0663 was discounted following analysis of a larger collection of strains.
Phase varied phenotypes appear to be relatively stable in in vitro culture. The differences in the expressed genes (rather than their ability to phase vary), and combinations of phase varied genes may still play a role in determining the outcome of infection. A recent study shows that during infection the colonisation fitness of H. pylori correlates strongly with a need to phase vary fewer genes [31]. It may be that the early behaviour and subsequent outcome of neisserial infection is also influenced by the phenotypes of inoculating populations.

Whole genome analysis
The complete genome sequences of N. meningitidis serogroup A strain Z2491 [15], serogroup B strain MC58 [16], and serogroup C strain FAM18 (Sanger Institute; http:// www.sanger.ac.uk/Projects/N_meningitidis/seroC/ seroC.shtml) and N. gonorrhoeae strain FA1090 (available at http://www.genome.ou.edu/gono.html) were analysed using previously described whole-genome analysis methodology [14,17,32], using an ACEDB graphical interface [33]. Briefly, repeats composed of perfect repeats with motifs of 1-10 bases were identified using ARRAYFINDER [34]. All repeats were displayed in their sequence contexts with respect to ORFs and termination codons using the tools within ACEDB, and their neisserial protein and nucleotide homologies. These complete genome sequence databases were then analyzed for simple DNA repeats within their sequence contexts to determine the repertoire of putative phase variable genes. Homopolymeric tracts of greater than 6 Gs or Cs, and greater than 8 As or Ts, were each investigated and repeats below these thresholds when associated with a frameshift. Other repeats composed of ≥ 4 copies of dinucleotides and ≥ 3 copies of tetramer and longer motifs were also investigated. All repeats were analyzed to interpret the significance of the repeat on the basis of sequence context and the potential effect of length variation on the expression of the associated reading frames.
The selected genes addressed in this study are listed in Table 1 with an indication of their likelihood of phase variation based upon sequence analysis and previous publications. In this Table K indicates a gene known and reported to be phase variable from previous studies. Strong candidates (S) include those in which the tract length differs in different strains, but has not yet been shown to vary within a single strain. Strong candidates also include genes with tract lengths similar in composition to those that have been seen to vary in other genes, particularly if this is the source of a frame-shift. For example, the adhesin XNG1371 is not found in the meningococcal genome sequences, but contains a (CAAG)20 tract; similar tracts being seen in virG, vapA, and NMB1507, the lengths of which differ between strains. Further, the homopolymeric tracts in the gonococcal genome specific hypothetical genes XNG0470, XNG0473, XNG0503a, XNG1000, XNG1207a, XNG1511, and XNG1577 are each the sole source of a frame-shift mutation within the coding region and are of lengths that have been seen to mediate phase variation in other genes.
Moderate candidates (M) are those in which the tract length is typical of a phase variable gene in one or more sequences, but for which there is no evidence of tract length differences with other strains, largely due to the absence of the gene or the absence of the repeat tract in other strains. When present, the repeat is either not associated with a frame-shift, as are those strain-specific genes that have been identified as strong candidates for phase variation, or the frame-shift is present in all strains assessed and/or the frame-shift associated repeat length is < 7 bp in a homopolymeric tract.
Low candidates (L) are those in which the repeat tract has not been seen to change. Also included in this category are those genes in which a short homopolymeric repeat tract (< 7 bp) is different between strains and results in a frameshift mutation that may not be readily reversible. For example, the reduction in the homopolymeric repeat from an in-frame (C)7 to a frame-shifted (C)5 may have then lead to subsequent additional mutations in the gene, rendering it degenerate (e.g. XNG0158).

Bacterial strains and growth conditions
The strains used are shown below in Table 4. Strains were grown on GC agar (Difco Laboratories) with the Kellogg supplement and ferric nitrate [35] at 37°C under 5% (v/ v) CO2. Cultures were grown from frozen stocks, passaged once the following day to obtain an inoculum for semi-confluent growth, and DNA was prepared from young healthy colonies early the following morning.

PCR amplification and sequencing
Chromosomal DNA extractions were performed using the Aquapure genomic DNA kit (BioRad). PCR from chromosomal DNA was performed using Hotstar Taq DNA Polymerase (Qiagen) according to the manufacturers' instructions. Primer pairs used are shown in Table 5. If amplification was not successful with primer pair 1 it was repeated with primer pair 2. Automated sequencing was done using ABI Prism ® BigDye™ Terminator Cycle Sequencing version 3.0 (Applied Biosystems), and was resolved on an ABI Prism ® 3100 DNA Sequencer (Applied Biosystems).

Bioinformatics
ACEDB [33] was used to analyze the complete genome sequences of the Neisseria spp., as described previously [17]. Sequences were read, edited and aligned using Seqlab (Wisconsin Package, Version 10.2, Genetics Computer Group, Madison, Wisc., USA) through the Sir William Dunn School of Pathology/WIMM Computational Biology Research Group (CBRG). Homology searches were performed using BLAST [36] against the EMBL databases, accessed through the CBRG.