- Research article
- Open Access
Evolution in an oncogenic bacterial species with extreme genome plasticity: Helicobacter pyloriEast Asian genomes
BMC Microbiology volume 11, Article number: 104 (2011)
The genome of Helicobacter pylori, an oncogenic bacterium in the human stomach, rapidly evolves and shows wide geographical divergence. The high incidence of stomach cancer in East Asia might be related to bacterial genotype. We used newly developed comparative methods to follow the evolution of East Asian H. pylori genomes using 20 complete genome sequences from Japanese, Korean, Amerind, European, and West African strains.
A phylogenetic tree of concatenated well-defined core genes supported divergence of the East Asian lineage (hspEAsia; Japanese and Korean) from the European lineage ancestor, and then from the Amerind lineage ancestor. Phylogenetic profiling revealed a large difference in the repertoire of outer membrane proteins (including oipA, hopMN, babABC, sabAB and vacA-2) through gene loss, gain, and mutation. All known functions associated with molybdenum, a rare element essential to nearly all organisms that catalyzes two-electron-transfer oxidation-reduction reactions, appeared to be inactivated. Two pathways linking acetyl~CoA and acetate appeared intact in some Japanese strains. Phylogenetic analysis revealed greater divergence between the East Asian (hspEAsia) and the European (hpEurope) genomes in proteins in host interaction, specifically virulence factors (tipα), outer membrane proteins, and lipopolysaccharide synthesis (human Lewis antigen mimicry) enzymes. Divergence was also seen in proteins in electron transfer and translation fidelity (miaA, tilS), a DNA recombinase/exonuclease that recognizes genome identity (addA), and DNA/RNA hybrid nucleases (rnhAB). Positively selected amino acid changes between hspEAsia and hpEurope were mapped to products of cagA, vacA, homC (outer membrane protein), sotB (sugar transport), and a translation fidelity factor (miaA). Large divergence was seen in genes related to antibiotics: frxA (metronidazole resistance), def (peptide deformylase, drug target), and ftsA (actin-like, drug target).
These results demonstrate dramatic genome evolution within a species, especially in likely host interaction genes. The East Asian strains appear to differ greatly from the European strains in electron transfer and redox reactions. These findings also suggest a model of adaptive evolution through proteome diversification and selection through modulation of translational fidelity. The results define H. pylori East Asian lineages and provide essential information for understanding their pathogenesis and designing drugs and therapies that target them.
Genome sequence comparison within a species can reveal genome evolution processes in detail and provide insights for basic and applied research. For bacteria, this approach has been quite powerful in revealing horizontal gene transfer, gene decay, and genome rearrangements underlying adaptation, such as evolution of virulence . Comparison of many complete genome sequences is feasible through innovations in DNA sequencing.
Helicobacter pylori was the first species for which two complete genome sequences were available . This species of ε-proteobacteria causes gastritis, gastric (stomach) ulcer, and duodenal ulcer, and is associated with gastric cancer and mucosa-associated lymphoid tissue (MALT) lymphoma [3, 4]. Animal models show a causal link between H. pylori and gastric cancer [5, 6]. Recent clinical work in Japan suggests that H. pylori eradication reduces the risk of new gastric carcinomas in patients with a history of the disease .
H. pylori shows a high mutation rate and an even higher rate of homologous recombination . Phylogenetic analysis based on several genes revealed geographical differentiation since H. pylori left Africa together with Homo sapiens . The analysis indicated that the East Asian type (hpEastAsia) is classified into at least three subtypes: East Asian (hspEAsia), Pacific (hspMaori) and native American (hspAmerind) [9, 10]. The East Asia subtype (hspEAsia) may be related to the high incidence of gastric cancer in East Asia .
H. pylori CagA is considered to be a major virulence factor associated with gastric cancer. CagA is delivered into gastric epithelial cells and undergoes phosphorylation by host kinases. Membrane-localized CagA mimics mammalian scaffold proteins, perturbs signaling pathways and promotes transformation. CagA is noted for structural diversity in its C-terminal region, which interacts with host cell proteins. It is classified into Western and East Asian types, with higher activities associated with the latter . The East Asian CagA-positive H. pylori infection is more closely associated with gastric cancer . Geographical differences have also been noted for other genes [13–17].
To fully characterize these bacteria (hspEAsia subtype of H. pylori) and to study underlying intraspecific (within-species) evolutionary processes in detail at the genome sequence level, we determined the genome sequence of four Japanese strains and compared them to available complete H. pylori genome sequences. The sequences of the Japanese strains and two Korean strains were different in gene content from the European and West African genomes and from the Amerind genome. Unexpectedly, divergence was seen in genes related to electron transfer and translation fidelity, as well as virulence and host interaction.
The complete genome sequences of four H. pylori strains (F57, F32, F30 and F16) isolated from different individuals in Fukui, Japan were determined. We compared 20 complete genomes of H. pylori (the 4 new genomes and 16 genomes in the public domain; Table 1), focusing on their gene contents.
Japanese/Korean core genomes diverged from the European and then the Amerind
A phylogenetic tree was constructed from concatenated seven genes atpA, efp, mutY, ppa, trpC, ureI and yphC, which were used for multi-locus sequence typing (MLST)  and phylogenetic analyses [19, 20]) (Additional file 1 (= Figure S1)). The tree showed that the 6 East Asian strains, the 4 Japanese strains (F57, F32, F30 and F16) and the 2 Korean strains (strain 51 and strain 52), are close to the known subpopulation hspEAsia of hpEastAsia, whereas 4 strains (Shi470 , v225d , Sat464 and Cuz20) are close to another subpopulation of hpEastAsia, hspAmerind. Strains 26695, HPAG1, G27, P12, B38, B8 and SJM180 were assigned to hpEurope. Strains J99 and 908 were assigned to hspWAfrica of hpAfrica1. PeCan4 was tentatively assigned to hspAmerind although it appears to be separate from the above 4 hspAmerind strains and somewhat closer to other subgroups (a subgroup of hpEurope, hspMaori and a group of "unclassified Asia" in the HpyMLST database ).
We deduced the common core genome structure of these 20 genomes based on the conservation of gene order using CoreAligner  (Table 1). CoreAligner determines the set of core genes among the related genomes not by universal conservation of genes but by conservation of neighborhood relationships between orthologous gene pairs allowing some exceptions. As a result, CoreAligner identified different numbers of core genes among strains (1364-1424), which reflect deletion, duplication and split of the core genes in the individual strains.
For phylogenetic analysis among the strains, we further extracted 1079 well-defined core orthologous groups (OGs) as those that were universally conserved, non-domain-separated, and with one-to-one correspondence (see Methods). The concatenated sequence of all well-defined core OGs resulted in a well-resolved phylogenetic tree (Figure 1). The tree was composed of two clusters, one containing the Japanese, Korean and Amerind strains and the other containing the European and West African strains. The tree strongly supported a model in which the Japanese/Korean strains (hspEAsia) and the Amerind strains (hspAmerind) diverged from their common ancestor, which in turn diverged from the ancestor shared by the European strains (hpEurope) long before. This conclusion is robust, as shown by the high bootstrap values of the internal nodes, primarily because the tree is composed of a large quantity of sequence information with approximately 1400 genes. The Japanese and Korean strains were not separated into two clusters. PeCan4 appeared diverged from the other four hspAmerind strains as expected from the result of the phylogenetic analysis based on the 7 genes described above. SJM180 appeared diverged from the other hpEurope strains in the well-defined core gene-based tree.
Phylogenetic profiling to identify gene contents of hspEAsia
To thoroughly characterize the gene contents specific to the Japanese/Korean (hspEAsia) strains, we conducted phylogenetic profile analysis using the DomClust program . This analysis determines the presence or absence of a domain, rather than a gene, and allows detection of split genes, partially deleted genes and partially duplicated genes (detailed in Methods). Their features will be explained in the next five sections.
Differences in outer membrane proteins and related proteins in the number of loci of gene families and in alleles at each locus
One of the emerging features of the East Asian (hspEAsia) strains is the change in the number of loci of some of the outer membrane protein (OMP) families. We detected five OMP genes (gene families; oipA, hopMN, sabAB, babABC and vacA-2) with the number of loci different between the hspEAsia and hpEurope strains (Table 2). In all but one gene family, the difference in the number of locus was the result of gene decay in the East Asian (hspEAsia) strains.
The notable exception was oipA, for which a secondary locus was found in hspEAsia (6/6 strains) and hspAmerind (5/5), but not in hpEurope (0/7) or hspWAfrica (0/2). This increase of the secondary locus can be explained by a novel DNA duplication mechanism associated with inversion . The two hopMN loci in hpEurope (7/7 strains) and hspWAfrica (1/2) were reduced to one locus in the hspEAsia (6/6) and hspAmerind (5/5). This loss was likely caused by the same duplication mechanism .
For the babABC family, the babC locus  was empty in all the hpEastAsia strains (6/6 hspEAsia and 5/5 hspAmerind) as well as from all the hspWAfrica strains (2/2) and two hpEurope strains (B38 and B8). This is in contrast to the presence of three loci in the other (5/7) European strains (Table 2).
The strain J99 carried a sabA gene (jhp0662) at the sabA locus and a sabB gene (jhp0659) at the sabB locus . All the hpEurope strains but the strain B38 (6/7) and this hspWAfrica strain (J99) had these two loci, whereas all the hpEastAsia strains but the strains 52 and PeCan4 (5/6 hspEAsia and 4/5 hspAmerind) lacked sabB locus (Table 2). These hpEastAsia strains all carried a sabA gene at the sabA locus. Genes of hpEurope differed among strains. Three strains (HPAG1, G27 and SJM180) carried a sabA gene at the sabA locus and a sabB gene at the sabB locus, as J99. The strain 26695 carried a sabA gene at both the sabA and sabB loci, whereas the strain P12 carried a sabB gene at both the loci. The strain B8 carried a sabA gene at the sabA locus and a hopQ gene at the sabB locus, along with another hopQ gene at the hopQ locus.
Some of these genes (oipA, babA and babB) and homAB genes were previously reported to diverge between the East Asian and Western strains [13, 14, 17]. Difference in the number of copies of homAB genes between East Asian and Western strains was reported .
For hopMN, two gene types (hopM and hopN) have been recognized [26, 27]. Phylogenetic network analysis revealed two variable regions within the hopMN family (region II and IV; Figure 2). Combining the two types of two variable regions defined four main gene types, of which two corresponded to hopM and hopN. The two types in region II were designated m1 and m2 (m for mid). The types in region IV were designated c1 and c2 (c for C-terminus); c3 was another variant type in region IV, composed of parts of c1 and c2. In this designation, previous hopM and hopN genes correspond to hopMNm1-c1 and hopMNm2-c1, respectively. All hpEastAsia strains except the strains 52 and PeCan4 (9/11) carry sequence type c2 at region IV. The c3 variant is observed in J99, PeCan4 and SJM180 (Figure 2A and 2F).
Three vacA paralogs and vacA itself were found in 26695 . Those paralogs share the auto-transporter domain at the C-terminus with vacA . A large deletion in vacA-2 (HP0289) (approximately 2400 amino acids) was found in all the hspEAsia strains except the strain 51 (5/6) (Table 2 and Additional file 2 (= Table S1)).
It was described earlier that horA OMP locus in 26695 is composed of two open reading frames (ORFs) (HP0078/HP0079) whereas that in J99 is composed of one ORF (jhp0073) . The horA locus in all the hspEAsia strains shows apparent gene decay by fragmentation through various mutations (Figure 3). Whether the genes in the other strains are functional is not known.
A putative periplasmic endonuclease gene (nucG, HP1382) was split in all the hspEAsia strains examined (Table 2 and Additional file 2 (= Table S1)). Detailed analysis revealed that the split was mediated by recombination between short similar sequences .
Massive decay of molybdenum-related genes for two-electron reduction-oxidation reactions
Unexpectedly, our profiling suggested that functions related to molybdenum (Mo) were lost specifically in the hspEAsia strains (Table 3 and Additional file 2 (= Table S1)). The trace element Mo is essential for nearly all organisms . After transport into the cell as molybdate, it is incorporated into metal cofactors for specific enzymes (molybdo-enzymes) that catalyze reduction-oxidation (redox) reactions mediated by two-electron transfer.
In the 20 H. pylori genomes, the only gene for molybdo-enzymes identified was bisC. At least one gene in each of the three Mo-related functions, Mo transport, Mo cofactor synthesis and a Mo-containing enzyme, decayed in all hspEAsia strains (Table 3 and Figure 4). Detailed analysis of nucleotide sequences revealed a mutation in 10 of 12 Mo-related genes in some of the hspEAsia strains (Table 3 and Additional file 3 (= Table S2)). The occurrence of apparently independent multiple mutations (Additional file 3 (= Table S2)) suggests some selection against use of Mo in the hspEAsia strains. All other strains but P12 possessed all intact genes. The strain P12 had a truncation of moaD (Additional file 3 (= Table S2)). Tungsten sometimes substitutes for Mo, but genes for known tungstate/molybdate binding proteins (TupA and WtpA) were not found in the H. pylori genomes.
The sequences in the four Japanese strains were confirmed by polymerase chain reaction (PCR) with the primers listed in the Additional file 4 (= Table S3).
The Mo-related genes were in a list of "chronic gastritis-associated" genes , primarily because they are absent from three Amerind strains from the Athabaskan people . The 5 Amerind strains analyzed in the present study are different from the three Amerind strains in this respect. This difference could reflect the later migration of the Athabaskans to the Americas .
Two pathways between acetyl~CoA and acetate in some Japanese strains
Our profiling revealed an important change at the center of energy and carbon metabolism related to acetyl~CoA. Two pathways connect acetyl~CoA and acetate (Figure 5A). In anaerobic fermentation, acetyl~CoA is converted into acetate by phosphoacetyl transferase (pta product) and acetyl kinase (ackA product) with generation of ATP (anaerobic pta-ackA pathway) . The intermediate acetyl~P, a high-energy form of phosphate, likely serves as a global signal. Although these reactions are reversible, assimilation of acetate may be irreversibly mediated by acetyl~CoA synthetase (acoE product) by the generation of acetyl~CoA, which enters the TCA cycle to generate energy under aerobic conditions (aerobic acoE pathway).
It has been suggested that strain 26695 (hpEurope) carries a mutation in pta for the former pathway whereas strain J99 (hspWAfrica) lacks acoE for the latter [28, 34]. All European strains in this study (7/7) had at least one inactivated pta and ackA gene through a variety of mutations (Figure 5C). Two of five Amerind strains, PeCan4 and Cuz20, also had a mutated pta and ackA, whereas the other 3/5 Amerind, 2/2 African, and 3/6 hspEAsia strains had a pta and ackA intact but had a deletion of acoE. Exceptions to such apparent incompatibility between the two pathways were found for 3/4 of the Japanese strains (F16, F30 and F57), which had intact genes for both pathways (Figure 5BCD). The sequences in the four Japanese strains were confirmed (see Methods and Additional file 4 (= Table S3)).
A gene for an amino acid utilization
An ortholog of jhp0585 in J99 is absent from 26695 . An ortholog is present in the six other hpEurope strains and both hspWAfrica strains, but absent from all hpEastAsia strains (hspEAsia and hspAmerind) (Additional file 2 (= Table S1)). It encodes a homolog of 3-hydroxy-isobutyrate dehydrogenase and the related beta-hydroxyacid dehydrogenase (COG2084). The 3-hydroxy-isobutyrate dehydrogenase degrades the branched-chain amino acid valine. H. pylori requires branched amino acids for growth. It is not known what the substrates or products of reactions catalyzed by this gene product are, or the biological relevance of its distribution.
Gene contents unique to other groups
Phylogenetic profiling involving four groups (6 hspEAsia, 5 hspAmerind, 7 hpEurope, and 2 hspWAfrica strains) (Additional file 2 (= Table S1), second sheet) revealed the following group-specific genes:
tas (HP1193) for aldo-ketoreductase was present in all hpEurope strains except one (HPAG1) and one hspWAfrica strain (J99), but was absent from all hpEastAsia strains (hspEAsia and hspAmerind). Aldo-keto reductases (AKRs) constitute a large protein superfamily of mainly NAD(P)-dependent oxidoreductases involved in carbonyl metabolism . This gene is fragmented in H. acinonychis strain Sheeba .
homB encoding an outer membrane protein was present in all but two (B8 and SJM180) hpEurope strains (5/7) but absent from the others. This result is in agreement with an earlier study .
trl was detected in all hpEastAsia (hspEAsia and hspAmerind) strains and 2/7 hpEurope strains (26695 and HPAG1). It is present between tRNA(Gly) and tRNA(Leu), and co-transcribed with tRNA(Gly) . It is found in roughly half the clinical isolates in Ireland . Its homologs are present at two loci in 26695 .
A part of xseA for Exonuclease VII large subunit was duplicated in all the hspAmerind strains but the strain PeCan4. Escherichia coli exonuclease VII degrades single-stranded DNA and contributes to DNA damage repair and methyl-directed DNA mismatch repair to avoid mutagenesis [39–41]. This part of xseA was present in the neighbor of 3 other genes in these hspAmerind strains. These 4 genes may form a genomic island.
IS606 transposase gene was present in all hspAmerind and hspWAfrica strains, and one hpEurope (26695) strain, but was absent from the others.
Most of fecA-2 gene, a fecA paralog, was deleted in the hspAmerind strains. The fecA gene, for Iron (III) dicitrate transport protein, is important under aerobic conditions . There are several links between iron metabolism and oxidative stress defense in H. pylori .
The hopZ OMP gene was split in the hspAmerind strains. The hopZ gene is involved in adhesion .
The hopQ OMP gene decayed in the hpEastAsia strains (hspEAsia and hspAmerind). This observation agrees with an earlier work .
H. pylori can ferment pyruvate to ethanol via an alcohol dehydrogenase . Duplication of the alcohol dehydrogenase gene as in J99 (jhp1429)  was seen only in the two hspWAfrica strains (J99 and 908).
Prophage-related genomic islands and other mobile elements
Except for the cag pathogenicity island (cagPAI), five genomic islands (GIs) were identified in the genomes of the four Japanese strains (Table 4, Figure 6 and Figure 7). In F32, the cagPAI was flanked by a 44-bp direct repeat, which extended the 22-bp sequence found in the other strains (Table 4). This length of sequence identity would allow homologous recombination  leading to the excision of cagPAI flanked by the repeat.
A GI found in strain F16 lacked similarity to known GIs of H. pylori whereas the other four GIs were homologous to transposable elements TnPZs, as recently reported [48, 49]. The GI in F16 appears to be a remnant of a prophage inserted into a restriction-modification system (Figure 6A). It is homologous to the 5'-half of the Hac II prophage found in H. acinonychis Sheeba. The F16 GI appeared to have lost its 3'-half, presumably through deletion mediated by the inserted IS605 copy. The GI included putative phage integrase genes (HPF16_0475 and HPF16_0476) that suggest the mobility of this region, and a DNA primase gene (HPF16_0468). The gene (HPF16_0469) next to the DNA primase gene had weak sequence similarity to a putative phage helicase gene (ORF35 of bacteriophage phi3626, e-value 5e-5 by TBLASTN against phage nucleotide database), which can be assumed to be the primase-helicase system found in several bacteriophages such as T3, T4, T7 and P4 . Recently, a partial Hac II prophage region was reported for another H. pylori strain .
The other four GIs in the other three strains had sequence similarity to TnPZs . One GI in F57 was entirely homologous to the type 1 TnPZ inserted into the coding region for a DNA methyltransferase with 8-bp target duplication (5' ACATTCTT) (Figure 6B). The GI in F32 appeared to have been deleted by a type 2 TnPZ (Figure 7B). Among the Korean strains, a Type 2 TnPZ was observed only in strain 51.
The plasmid in F30 (pHPF30) was similar to a group of previously characterized H. pylori plasmids such as pHel4 in H. pylori [52, 55]. This carries genes for microcin (7-aa peptide; MKLSYRN), MccB (microcin C7 biosynthesis protein), MccC (microcin C7 secretion protein), MobBCD (for plasmid mobilization), a replication initiator protein, and two relaxases. When compared to other related plasmids, a substitution in mobB and a deletion covering several small ORFs were seen. Homologous plasmids are found in G27 (pHPG27 ), P12 (HPP12 ), and v225d . HPAG1 , B8 , PeCan4 and Sat464 carry a similar plasmid without the MccBC genes.
Insertion sequences (ISs) were searched for in the Japanese strains using GIB-IS . An apparently intact known IS was detected in two strains: IS607 in F16; IS605 in F32.
Divergence of genes between the East Asian (hspEAsia) and the European (hpEurope) strains
We systematically examined the amino acid-based phylogenetic trees of the orthologous genes (gene families) common to the six hspEAsia genomes and the seven hpEurope genomes. Trees of 687 OGs were selected with genes of the hspEAsia strains forming a sub tree with no genes of the hpEurope strains and vice versa. Each of the orthologs was plotted according to two distance parameters: d a for the hspEAsia-hpEurope divergence and d b for intra-hspEAsia divergence (Figure 8A). An hspEAsia-hpEurope divergence greater than twice that of the well-defined core tree (d a *) was seen in 47 gene families (Table 5 and 6; genes of those orthologs in each strain are listed in Additional file 5 (= Table S4)). These genes were further divided by the intra-hspEAsia divergence (d b ) into zone 1 (lowest divergence), zone 2 (average divergence) and zone 3 (highest divergence) (Figure 8B). Six typical trees are depicted in Figure 8C. The cagA tree (e) (zone 3) has large d a and d b values and a low d b /d a value, primarily because of the divergence in a C-terminal region of the ORF. This region, including sequences known as EPIYA (Gln-Pro-Ile-Tyr-Ala) motif, is involved in host interaction [22, 59]. The tree here is consistent with previous results .
This tree-based analysis effectively extracted known pathogenesis-related genes (Table 5 and Table 6) as discussed below. The list also included several genes related to antibiotics. Amino acid alignments (Additional file 6) located the divergent sites. The distribution pattern of these sequences suggests a possible relationship between structure and function as detailed below for each protein. The divergence could be related to differential activity and adaptation.
The variable d a for an orthologous group is expected to be sensitive to the presence of a member with an exceptional phylogeny. The strain B8, assigned to hpEurope in this work (Additional file 1 (= Figure S1)), has been adapted to a mongolian gerbil . The strain SJM180, also assigned to hpEurope based on the tree of seven MLST genes (Additional file 1 (= Figure S1)), clustered with hspWAfrica strains rather than with hpEurope strains in the tree of the well-defined core genes (Figure 1). To examine robustness of the above classification into diverged genes, the same analysis was conducted using the 6 hspEAsia strains and 5 hpEurope strains excluding B8 and SJM180 (Additional file 7 (= Table S5)). These two analyses used all the 20 strains, because we expected inclusion of the hspAmerind and hspWAfrica strains may provide better classification of the sub trees. In addition to these two analyses, analysis with the 6 hspEAsia and 7 hpEurope strains or with the 6 hspEAsia and 5 hpEurope strains was carried out, which allowed assignment of a bootstrap value to the branch separating the hspEAsia and hpEurope strains. Comparison of these 4 analyses is summarized in Additional file 7 (= Table S5). The four sets of results agreed rather well, especially for those genes with larger d a value: 34 among the 47 genes in Table 6 were extracted in all the 4 analyses. The bootstrap value supported the separation of hspEAsia and hpEurope well in most cases, with the bootstrap value ≥ 900 in 41 among the 47 genes.
Positively-selected amino-acid changes between the East Asian (hspEAsia) and European (hpEurope) strains
Divergence could be adaptive or neutral. We searched for sites where the hspEAsia-hpEurope changes in amino acids were positively selected  and found that 7 of 47 genes passed the likelihood test (Table 7; red dots in Figure 8B). These selected sites were mapped on the coding sequences (Figure 9A). For CagA, several sites were found outside the area of EPIYA segments.
Three-dimensional structure was available for mapping some of the selected sites for three of these genes (Figure 9B). The three-dimensional structure of part of VacA, the p55 fragment, is determined . S793A mapped on the surface of the p55 at its C-terminal region (Figure 9B). Deletion of the p55 region reduces VacA binding to cells , so S793A might affect cell binding of the hspEAsia and hpEurope strains. Two selected residues of HpaA-2 were mapped (Figure 9B). The residue (H211) corresponding to the selected residue H174 of H. pylori MiaA mapped to the alpha helix 10 of E. coli MiaA [63, 64] (Figure 9B).
Diverged genes and possible biological significance
Known virulence genes
Four genes in Table 6, cagA, vacA, hcpD and tipα, are virulence genes.
CagA is introduced in the Background section and discussed above in the section "Divergence of genes between the East Asian (hspEAsia) and the European (hpEurope) strains". VacA is another important virulence protein . The hcpD (HP0160) is a member of the Hcp (H. pylori cysteine-rich protein) family, which contains repeat motifs characteristic to the eukaryotic Sel1 regulatory proteins, is secreted and interacts with the host immune systems . Geographical divergence and positive selection for amino acid changes in this family, including HcpD, are reported . HP0596 encodes tumor necrosis factor alpha-inducing protein (Tipα), a DNA-binding protein . This enters the gastric cells and induces TNF-alpha, an essential cytokine for tumor promotion.
The cagA gene is discussed above in the section "Divergence of genes between the East Asian (hspEAsia) and the European (hpEurope) strains". The vacA gene showed a qualitatively similar pattern of intra-hspEAsia divergence and overall divergence as cagA (Figure 8C (d)). The overall tree pattern was consistent with previous studies (for review, see ). Intra-hspEAsia divergence was large for hcpD. Positively-selected residues of cagA and vacA are described above.
Outer membrane proteins
The vacA gene is discussed above. vacA-4 is a vacA paralog. The hpaA-2 is of unknown function , but is a paralog of hpaA  which is essential for adhesion . The homA/B genes are homologs of homC and known to have diverse copy number and genomic localization in Western and East Asian strains (Table 1) . OipA (also known as HopH) induces IL-8 from host cells . Geographical divergence of oipA has been reported .
Intra-hspEAsia divergence was intermediate for oipA/oipA-2 (Table 6).
The d a value (hspEAsia-hpEurope divergence) of homC (0.0325) was larger than the threshold distance (Table 6). Moreover, the homC genes of all hpEastAsia and hpAfrica1 strains but the strain 52 were greatly diverged from those of the hpEurope strains and the strain 52: distance 0.1387 for this separation was comparable to the largest d a values for hpaA-2 and cagA. Diverged residues were clustered in a specific region. Positively selected amino-acid changes of the putative homC product were identified (Table 7 and Figure 9).
The hopJ and hopK genes (HP0477 and HP0923) were similar within each strain but different between strains [26, 27]. This earlier observation, seen for 26695, J99 and HPAG1, was confirmed with the other genomes except for 908 and B8. This similarity of hopJ and hopK genes in one strain is likely to be caused by concerted evolution by homologous interaction, possibly with selection.
The babA and alpA genes were not included in the 687 OGs that showed complete separation between genes of the six hspEAsia strains and those of the seven hpEurope strains on the phylogenetic tree. BabA binds to Lewis b antigens [71, 72]. Geographic variation of BabA has been reported . AlpAB proteins are necessary for specific adherence to human gastric tissue . In the East Asian strains but not the Western strains, AlpA activates NF-κB-related pro-inflammatory signaling pathways .
The reason that the babA is not in Table 6 was mainly because babA genes of the hpEurope strains B8 and SJM180 grouped together with the hspEAsia strains (Additional file 7 (= Table S5)). The alpA in the hpEurope strain SJM180 grouped with the hspEAsia strains (Additional file 7 (= Table S5)).
Lipopolysaccharide synthesis and Lewis antigen mimicry
Three genes in Table 6, futA, futB and HP1105 (designated here as agt), are related to lipopolysaccharide (LPS) synthesis and Lewis antigen mimicry.
The lipopolysaccharides of H. pylori are important for host interaction. H. pylori can express Lewis and related antigens in the O-chains of its surface lipopolysaccharide that mimic the hosts. O-chains are commonly composed of internal Lewis X units with terminal Lewis X or Lewis Y units or, in some strains, with additional units of Lewis a, Lewis b, Lewis c, sialyl-Lewis X and H-1 antigens, as well as blood groups A and B, producing a mosaic of antigenic units . The activity and specificity of the fucosyltransferases may vary between the two paralogs in one strain, as well as between the orthologs in different strains . Mechanism of these changes is phase variation involving simple repeats and longer repeats [77, 78]. Such diversity could be adaptive and related to differences in pathogenicity .
The two fucosyltransferase genes (futA = HP0379, futB = HP0651) showed large hpEurope-hspEAsia divergence (the 4th largest d a value), as reported earlier . Intra-hspEAsia divergence was large for them (in zone 3). HP1105 (agt) was β-1,3-N-acetyl-glucosaminyl transferase gene for LPS synthesis. Another transfereaseα-1,6-glucosyltransferase gene (HP0159 = rfaJ-1) was in the list of 6 hspEAsia - 5 hpEurope comparison (Additional file 7 (= Table S5)).
Four genes in Table 6, sotB, secG, yajC, comH and cvpA, are related to motility and chemotaxis.
The sotB gene was similar to genes for sugar efflux transporters and multi-drug resistance transporters (COG2814, TIGR00880). SecG forms the machinery for protein translocation across the cytoplasmic membrane . YajC is a member of the preprotein translocase machinery, SecDF-YajC. SecDF-YajC inhibits disulfide bond formation between two SecG molecules . ComH is essential for natural transformation . Its putative N-terminal secretion signal suggests that it is either anchored in the cytoplasmic membrane or exported to the periplasm . The cvpA gene of E. coli is suggested to encode a membrane protein required for colicin V production/secretion .
The secG homolog, mHP1255, showed divergence focused around residues 150-160. The nucleotide sequence AAAGAGAAG encoding Lys-Glu-Asn was present once in hpEurope and hspWAfrica strains whereas repeated 2 to 4 times in tandem in all hpEastAsia strains (4 in F16, 3 in Sat464, and 2 in the others).
Positively-selected amino-acid changes of the putative sotB product were identified (Table 7). Of these, W186Y lay at the end of a transmembrane helical region away from the substrate tranlocation pores.
Motility and chemotaxis
Four genes in Table 6, fliT, fliK, maf and cheY, are related to motility and chemotaxis.
The fliT product is a flagellar chaperone , whereas the fliK product controls the hook length of flagella . The maf gene encodes a member of motility accessory family of flagellin-associated proteins implicated in flagellar assembly . The cheY gene (HP1067) encodes a response regulator of a two-component signal transduction system regulating chemotaxis . CheY does not act as a transcriptional activator. Instead, when activated, it interacts directly with the flagellar motor-switch complex, causing a clockwise rotation of the flagella that results in cell tumbling.
Seven genes in Table 6, fixQ, fixS, frxA, hypD, hydE, pgl and nuoF, are related to electron transfer.
Aerobic respiration in H. pylori has been analyzed experimentally and by genome sequences. A cb-type cytochrome c oxidase is the sole terminal oxidase present in H. pylori . FixQ (= CcoQ) is a component of the oxidase. The fixS gene likely encodes the cation transport subunit of the oxidase . It has been proposed that FixS plays a role in the uptake and metabolism of copper required for oxidase assembly . Aerobic respiration results in production of toxic superoxide at this terminal oxidase, which is involved in bacterial death . The frxA gene, NAD(P)H-flavin oxidoreductase, is involved in redox of flavins, which are important electron transfer mediators . Reduced flavins reduce ferric complexes or iron proteins with low redox potential. FrxA is one of the enzymes that make H. pylori sensitive to metronidazole . H. pylori is capable of hydrogen oxidation . HypD is involved in maturation of the [NiFe] H2-uptake hydrogenase, and catalyzes insertion and cyanation of the iron center . The hydE gene is also necessary for the hydrogenase activity . The pgl gene (HP1102) encodes a 6-phosphogluconolactonase, which catalyzes the second step of the phosphopentose pathway. This phase of the phosphopentose pathway generates reducing power in the form of NADPH and is important in other organisms in defense against reactive oxygen species and oxidative stress response [93, 94].
Four genes in Table 6, miaA, tilS, def, and prmA, are important for translation.
MiaA and TilS affects translation fidelity [95–97]. MiaA isopentenyl-tRNA transferase modifies the tRNAs that read codons starting with U to minimize peptidyl-tRNA slippage in translation. TilS, the tRNA(Ile2) lysidine synthetase, modifies cytidine to lysidine (2-lysyl-cytidine) at the first anticodon of tRNA(Ile2), thereby switching tRNA(Ile2) from a methionine-specific to an isoleucine-specific tRNA. Def removes a formyl group from the N-terminus of a nascent polypeptide and is a potential drug target . PrmA is a trimethyltransferase that methyates multiple residues in the N-terminal domain of ribosomal protein L11, a universally conserved component of the large ribosomal subunit .
There was evidence that divergence in miaA was adaptive (Table 7), and the relevant amino acid residue was mapped on the structure (Figure 9B ii), as described above. Intra-hspEAsia divergence was not large for def (located in zone 2), whereas large for miaA (in zone 3).
Four genes in Table 6, addA, rnhA, rnhB and hsdR, are nucleases.
AddA (AdnA, PcrA) is a RecB-like helicase that promotes DNA recombination repair and survival during colonization . Upon encounter with a DNA double-strand break, E. coli RecBCD enzyme degrades non-self DNA, but repairs self DNA marked by a genomic identification sequence through RecA-mediated homologous recombination. The identification sequence varies among bacterial groups  and can be altered by a mutation in RecBCD .
The rnhA and rnhB genes encode RNase HI and RNase HII, which hydrolyze RNA hybridized to DNA. Their biological role remains unclear, although they affect DNA replication, repair and transcription [103, 104].
An AT-rich region of the addA gene linking the helicase domain and the nuclease domain showed an interesting divergence: the sequence AAAGAAAG(T/C)AAA encoding Lys-Glu-Ser-Lys was repeated in tandem 2 to 8 times in the hspWAfrica and hpEurope strains but was absent or present only once in the hspEAsia strains. The hspAmerind strains have a single copy (4 strains) or two copies (1 strain).
Gene ftsA encodes an actin-like, membrane-associated protein that interacts with the tubulin-like FtsZ protein, helps it assemble into the Z ring, anchors it to the cytoplasmic membrane, and recruits other proteins for cell division . It is a potential drug target .
The ilvE gene (HP1468) encodes a branched-chain amino acid aminotransferase that generates glutamic acid from branched-chain amino acids (valine, leucine, isoleucine) that are essential to H. pylori. We do not know whether its divergence is related to loss of jhp0585, encoding a branched-amino-acid dehydrogenase, in all hpEastAsia strains (see above), or whether it is related to a possible geographical divergence in the amino acid content of food.
We closely compared complete genome sequences through phylogenetic profiling, phylogenetic tree construction, and nucleotide sequence analysis. The results distinguished decaying from intact genes and revealed drastic evolutionary changes within the H. pylori species. Our results clearly define the H. pylori East Asian lineage as distinct at the genome level from the African, European or Amerind lineages (Table 2). The East Asian lineage consists of Japanese and Korean genomes and corresponds to hspEAsia in the phylogenetic tree of the concatenated seven genes used for multi-locus sequence typing. The hspEAsia and hspAmerind lineages form a phylogenetic group hpEastAsia. The outstanding differences are in proteins related to: (i) host-interaction; (ii) electron transfer and redox metabolism; and (iii) translation fidelity.
Many of the virulence factors show wide divergence between hspEAsia and hpEurope, most likely because of co-evolution with the host. We anticipate that the list of well-diverged genes (Table 6) is enriched for host-interaction and potential virulence genes. We detected positively-selected amino-acid changes in two virulence factors: cagA and vacA (Table 7).
Many OMP families showed loss of one of their resident loci (hopMN, babABC, sabAB), whereas one family (oipA) showed duplication of its locus. Some OMP genes showed internal deletions (vacA-2) or interallelic homologous recombination (hopMN). A group-specific repertoire was seen for other OMP genes (homB, hopZ and hopQ), for other criteria. We also found substantial hspEAsia-hpEurope divergence in many OMPs (Table 5). The OMPs play important roles in host interaction such as adhesion to the host cells and induction of immune responses . For example, OipA induces IL-8 from host cells . Systematic decay of OMP genes occurred during adaptation of H. pylori to a new host of large felines, generating the new species of H. acinonychis . Hence, the above OMP changes might reflect selection and/or fine regulation in host interaction, and more specifically, may help avoid the host immune system. At least two OMPs show evidence for positive selection (Table 7). We do not yet know whether these OMP changes are related to immune response or adhesin activity.
Lewis antigen mimicry is important for gastric colonization and adhesion. The mimicry affects innate immune recognition, inflammatory response, and T-cell polarization. Long-term infection by H. pylori might induce autoreactive anti-Lewis antigen antibodies . Divergence in transferase genes for LPS biosynthesis may have resulted from co-evolution with the host immune system and could be related to changes in Lewis antigens in human populations. For example, the Le(a+b+) phenotype is almost absent in Caucasian persons whereas it occurs with a higher frequency in the Asian population . This might be related to differences in pathogenicity and adaptation .
Changes in transporter genes, the loss of a putative amino acid utilization gene, divergence in a branched chain amino acid metabolism gene, differences in acetate metabolism genes, and divergence in motility and chemotaxis genes could also be related to host interaction, because these are related to the stomach environment. An interesting question is if these changes are related to variation in human diets.
Several key electron transfer components were diverged between hspEAsia and hpEurope. The multiple and drastic changes in redox metabolism were unexpected. The systematic decay of all Mo-related genes through mutations in all (6/6) hspEAsia strains was the most striking. We do not know whether our findings reflect the biased environmental occurrence of Mo or the dietary habits of human populations. The richest sources of Mo include legumes, cereal grains (and baked products), leafy vegetables, milk, beans, liver, and kidney, whereas fruits, stem and root vegetables, and muscle meats are poor Mo sources .
The BisC homolog, the only molybdoenzyme found in the H. pylori genome, is similar to a number of periplasmic reductases for alternative oxidants such as dimethylsulfoxide or trimethylamine N-oxide . Western strains of H. pylori might be able to use N- and/or S-oxide as an electron acceptor in energy metabolism in addition to oxygen and fumarate. One hypothesis about decay of the Mo-related genes is that this anaerobic electron transport system became maladaptive in the East Asian lineage. One possibility is the radical reaction mediated by MoaA in molybdopterin synthesis is dangerous in the presence of oxygen. This could explain the observed changes in oxidative phosphorylation and acetate metabolism.
A candidate for the BisC substrate is an oxidized form of methionine, free or within a protein. Methionine is sensitive to oxidation, which converts it to a racemic mixture of methionine-S-sulfoxide (Met-S-SO) and methionine-R-sulfoxide (Met-R-SO) . The reductive repair of oxidized methionine residues performed by methionine sulfoxide reductase is important in many pathogenic bacteria in general, and specifically for H. pylori to maintain persistent stomach colonization [112, 113]. H. pylori methionine sulfoxide reductase (Msr, HP0224 product) is induced under oxidative stress control and can repair methionine-R-sulfoxide but not the S isomer, even though it is a fusion of an R-specific and an S-specific enzyme . BisC from other bacteria can reduce and repair the S but not the R form .
If the sole function of BisC is to repair methionine-S-sulfoxide, another means to repair methionine-S-sulfoxide may have appeared in the East Asian H. pylori, for example by higher expression of Msr. In this case, BisC may have been inactivated because Mo-related reactions were no longer necessary. The substitution by a DNA element downstream of the msr gene in the hspEAsia strains (5/6, all but strain 52) could be involved in the hypothesized methionine-S-sulfoxide repair activity of its product.
Another possibility is decrease of oxidative stress generating methionine-S-sulfoxide in the East Asian H. pylori. Oxidative stress is induced by acid exposure, and msr is among the oxidative stress genes induced by acid . H. pylori infection has different effects on acid secretion in Europe and Asia . In Europe, antral-predominant gastritis with increased acid secretion is frequent, whereas in Asia, pan-gastritis and subsequent atrophic gastritis with decreased acid secretion are common. The decrease in acid experienced by East Asian H. pylori lineages may have decreased their methionine-S-sulfoxide and made its repair by BisC unnecessary.
Downregulation of some of the Mo-related genes in a European strain under acidic conditions may be related to their decay . Downregulation may occur to avoid the possible toxic effects of Mo metabolism under conditions of acid adaptation.
Taken together, our results led us to predict that the East Asian H. pylori strains are different from the European strains in electron transfer reactions and responses to oxygen and acid. Possibly related to this alteration in redox is the presence of the two acetate-related pathways in 3 out of 4 Japanese strains. These are expected to be able to switch from acetate fermentation to acetate utilization under aerobic conditions, as seen for E. coli . The European strains, some of the hspAmerind strains, and the other hspEAsia strains may be regarded as mutants that lack the pta-ackA pathway and the supposedly important acetyl~P signal. Global effects of these defects on chemotaxis, nitrogen and phosphate assimilation, osmo-regulation, flagellar biogenesis, biofilm development, and pathogenicity are expected, based on the various phenotypes of E. coli strains defective in these genes .
Translational proteins also diverged between hpEurope and hspEAsia strains. MiaA (tRNA delta(2)-isopentenylpyrophosphate transferase) and TilS (tRNA lysidine synthetase) affect accuracy in elongation. The amino-acid change in MiaA turned out to be adaptive (Table 7). TilS affects translation efficiency at various stages. Ambiguity in translation is proposed to be important in the evolution of novel proteins by generating phenotypic and genetic diversity in the proteome for selection . This role of ambiguity is similar to the evolutionary role of genome-wide modulation of mutation rates by genes such as mutS .
Implications for medicine
East Asian (Japanese/Korean) H. pylori appear to be quite different from European H. pylori. Our results provide a solid starting point for understanding the biology, host interaction, and pathogenesis of the East Asian H. pylori, which in most previous works were inferred from a European strain. Divergences included virulence, cell surface-related, and drug target genes. These results will affect our strategy in developing effective therapies and drugs. Questions raised by our findings include whether East Asian VacA (Figure 9B) interacts with host cells in the same way as European VacA.
The diverged gene frxA is associated with resistance to antibiotics metronidazole , which is frequently used in H. pylori eradication. The divergence in the frxA could affect resistance to this group of drugs in various ways. More generally, if redox metabolism differs between hspEAsia and hpEurope strains, the same drug might produce different effects, depending on intra-bacterial redox reactions.
The diverged genes included two potential drug targets (def and ftsA), so drugs that target these proteins may have different effects in East Asian and European strains. We do not know, for example, whether anti-H. pylori drugs designed from structure of European Def  will be as effective against East Asian H. pylori.
Clearly, many studies are needed to answer these and other questions raised by the genomics results presented here. Phylogenetic analysis in the present study used OGs where genes of hspEAsia were clustered separately from those of hpEurope. Some genes do not share this topology, as suggested above for acoE deletion and hopMN recombination. We plan to study the distortion in the tree. We focused on differences between a limited numbers of strains from each group. However, there are variations within East Asian strains (Table 5). Further experimental examination of the divergence within hspEAsia, and between hspEAsia and the other strains are necessary to understand their divergence in detail. Such examination might reveal complexity in evolution and will be the subject of a separate study. The mechanisms underlying the variation, such as mutations and rearrangements, will be a subject of a separate study .
Taking advantage of the extreme genome plasticity of H. pylori, we demonstrated how drastically a genome can change during evolution within a species. Our results revealed drastic changes in proteins for host interaction and electron transfer and suggested their importance in adaptive evolution. These results define the H. pylori East Asian and Western lineages at the genome level, enhance our understanding of their host interaction, and contribute to the design of effective drugs and therapies. The approach of fine comparative analysis of closely-related multiple genomes may reveal subtle but important evolutionary changes in other populations.
Four strains were isolated from patients with diffuse type gastric cancer, intestinal type gastric cancer, duodenal ulcer, and gastritis (F57 , F32, F30 and F16 ). The ABO blood groups of the hosts were: F57, B; F32, A; F30, O; F16, B. Studies were performed according to the principles of the Declaration of Helsinki, and consent obtained from each individual after a full description of the nature and protocol of the study.
Gastric biopsy specimens from each patient were inoculated onto a trypticase soy agar (TSA)-II/5% sheep blood plate and cultured under microaerobic conditions (O2, 5%; CO2, 15%; N2, 80%) at 37°C for 5 days. A single colony was picked from each primary culture plate, inoculated onto a fresh TSA-II plate, and cultured under the conditions described above. A few colonies were picked from each plate and transferred into 20 ml of Brucella broth liquid culture medium containing 10% fetal calf serum, and cultured for 3 days under the conditions as described above. A part of the liquid culture sample was stored at -80°C in 0.01 M phosphate-buffered saline (PBS) containing 20% glycerol. DNA from each H. pylori isolate was extracted from the culture pellet by the protease/phenol-chloroform method, suspended in 300 μl of TE buffer (10 mM Tris HCl, 1 mM EDTA) and stored at 4°C for PCR analysis and nucleotide sequencing.
The genome sequences of H. pylori strains F16, F30, F32 and F57 were determined by a whole-genome shotgun strategy. We constructed small-insert (2 kb) and large-insert (10 kb) plasmid libraries from genomic DNA, and sequenced both ends of the clones to obtain 26,112 (F16 and F57), 30,720 (F30) and 33,792 (F32) sequences using ABI 3730xl sequencers (Applied Biosystems), with coverage of 10.0 (F16)-, 11.5 (F30)-, 12.7 (F32)- and 10.0 (F57)-fold. Sequence reads were assembled with the Phred-Phrap-Consed program, and gaps were closed by direct sequencing of clones that spanned the gaps or with PCR products amplified using oligonucleotide primers designed against the ends of neighboring contigs. The overall accuracy of the finished sequence was estimated to have an error rate of less than 1 per 10,000 bases (Phrap score of ≥40). Sequences of the molybdenum-related genes and the genes in the acetate pathway of the four Japanese strains were verified by resequencing PCR fragments directly amplified from genomic DNA (primers are in Additional file 4 (= Table S3)). The genome sequences of other strains were obtained from National Center for Biotechnology Information (NCBI) . Accession numbers are in Table 1.
Gene finding and annotation
Protein-coding genes were identified by integrating predictions from programs GeneMarkS  and GLIMMER3 . All ORFs longer than 10 amino acids were searched using BLASTP  against two databases, one composed of genes of 6 H. pylori genomes in RefSeq database at NCBI ("close" database), and the other composed of genes of 300 complete prokaryote genomes (one genome per one genus) available at the end of 2008, except for those in the Helicobacter genus ("distant" database). When the predicted start position differed in GeneMarkS and GLIMMER3, assignments were made by consensus of hits, with consensus against the "distant" database taking priority over the "close" one. The consensus start position among bidirectional best hits with 50% or more amino acid sequence identity for each matched region for each genome pair was determined by majority rule. Overlap of genes was resolved by comparing the results from four prediction programs. Genes encoding fewer than 100 amino acids and predicted only by Glimmer3 were dropped except for the microcin gene.
tRNA genes were detected using tRNAscan-SE . rRNA genes were identified based on sequence conservation. Putative replication origins were predicted by GC-skew (window size 500 bp, window shift 250 bp).
Core genome analysis
The common core structure conserved among 20 H. pylori genomes was identified based on conservation of gene order among orthologs using the CoreAligner program  implemented in the RECOG system. Briefly, CoreAligner identifies the genomic core of the input genomes by taking the longest path of the neighborhood graph that consists of conserved neighborhood gene pairs, which are defined as pairs of OGs that are within a neighborhood of 20 genes in at least half of the genomes. For this analysis, we used as input a set of OGs generated by the DomClust program  (see "Phylogenetic profile analysis" section below for details about identification of OGs by DomClust). Absence of a gene in some genomes (at least half of the genomes) in each OGs among the core is allowed. In addition, as identified OGs are at the domain level, if a counterpart of a gene in one genome is split in another genome, different number of genes can participate in the OGs in different genomes. Thus, the number of core genes in each genome can vary. Still, the numbers of core genes varied less (1364-1424; SD = 13.5) than the total number of genes among the strains (1465-1593; SD = 33.9) (Table 1). Among those core OGs, 1079 OGs were universally conserved (conserved in the all genomes), non-domain-separated, with one-to-one correspondence, and designated "well-defined core OGs". Those 1079 OGs were used for phylogenetic analysis (Figure 1). Nucleotide sequences of genes in well-defined core OGs were aligned by the Mafft program , from which conserved blocks were extracted by the Gblocks program .
Phylogenetic profile analysis
Phylogenetic profiling was carried out using the set of OGs generated by DomClust . We identified OGs with East Asian-specific features as those whose phylogenetic profiles were highly correlated to the template pattern (taking 1 for hspEAsia and 0 for hpEurope). The DomClust clustering program can identify OGs at the domain level, and was used to identify genes truncated in particular strains. Clustering was performed based on PAM (point accepted mutation) distance rather than score to ensure proper evaluation of evolutionary distances, even if one gene was truncated; in the latter case, scores may underestimate evolutionary relatedness. To clarify differences in gene-splitting patterns among strains, we did not use DomClust options to suppress domain splitting.
To identify genes with characteristic patterns of hspEAsia strains, we constructed a phylogenetic profile for each OG as a vector of examined property values (e.g., number of domains or number of duplications). For surveying patterns of gene splitting and deletion, a phylogenetic profile was constructed for each OG using the number of domains for each gene that resulted from the clustering. For surveying patterns of gene duplication, a phylogenetic profile was constructed using the number of duplicated genes (in-paralogs). To find OGs with a characteristic hspEAsia pattern, equality of the medians among different populations was tested by Kruskal-Wallis test. Tests between East Asian and European strains used the six hspEAsia strains and the seven hpEurope strains. Tests among four subpopulations used six hspEAsia, five hspAmerind, seven hpEurope, and two hspWAfrica strains.
Analyses of molybdenum-related genes
H. pylori protein sequences were searched against the CDD conserved protein domain database, by RPS-BLAST . Protein families extracted from the search results for Mo-cofactor synthesis or binding domain were: PF03404 (Mo-co_dimer), PF03205 (MobB), PF02738 (Ald_Xan_dh_C2), PF01568 (Molydop_binding), PF02730 (AFOR_N), PF02597 (ThiS), PF03454 (MoeA_C), PF06463 (Mob_synth_C), PF03453 (MoeA_N), PF01315 (Ald_Xan_dh_C), PF01493 (GXGXG), PF02579 (Nitro_FeMo-Co, PF01967 (MoaC), PF03459 (TOBE), PF02391 (MoaE), PF00384 (Molybdopterin), PF04879 (Molybdop_Fe4S4), PF02665 (Nitrate_red_gam), PF00174 (Oxidored_molyb), PF00994 (MoCF_biosynth), PF03473 (MOSC), PF02625 (XdhC_CoxI), PF01314 (AFOR_C), PF01547 (SBP_bac_1) (pfam name in parentheses). Homologs of two molybdoproteins  that were not detected in the above protein families were absent in the H. pylori genomes.
bisC was the only molybdoenzyme gene in the 20 H. pylori genomes with detected domains PF01568 (Molydop_binding) and PF00384 (Molybdopterin). A multidomain TIGR00509 (bisC_fam) was also detected in bisC.
Analyses of horizontally transferred regions
GIs were detected by searching for regions that fulfilled the conditions of: (i) longer than 5 kb; (ii) continuous ORFs not perfectly conserved in all 20 H. pylori strains; and (iii) whole regions assumed as extrinsic by Alien Hunter . Counterparts of detected GIs in Amerind strains were previously reported as TnPZ [48, 49].
Genes with a large distance between East Asian and European strains
OGs diverged between six hspEAsia and seven hpEurope strains were screened based on two values related to their phylogenetic tree. The d a value was the distance between the last common ancestral (LCA) node of hspEAsia and the LCA node of hpEurope. The d b value was the average distance of hspEAsia from its LCA node. OGs with hspEAsia-diverged genes were screened by introducing the following conditions (with hspAmerind omitted): (i) OGs in which all the hspEAsia genes of the OG formed a sub tree without any hpEurope genes in the phylogenetic tree; (ii) OGs universally conserved (not less than 12 of the 13 genomes; not less than 10 among 11 genomes for comparison of 6 hspEAsia and 5 hpEurope strains in Additional file 7 (= Table S5)); (iii) genes with no domain fusion/fission event among the 13 genomes (within ± 20% of the mean length of the OG, measured in amino acid residues); (iv) d a value greater than twice the d a value of the concatenated well-defined core tree (of amino-acid sequences) (denoted as d a *; with the resulting cutoff of d a > 0.02324; 1079 OGs; see "core genome analysis" section above). Among 1248 OGs that satisfied the criteria (ii) and (iii), 692 OGs satisfied the criteria (i), that is, complete separation of genes of hspEAsia from those of hpEurope. The d b * ± sd values in logarithmic scale, corresponding to 0.00550 and 0.0231 (d b * = 0.01128) in the original scale, were used as threshold values for the three zones (N = 687; five OGs with d b = 0 were excluded from 692 OGs satisfying the above criteria (i)-(iii)).
A branch-site likelihood ratio test of positive selection was carried out using PAML  based on the multiple alignment by the einsi command of MAFFT . Only residues aligned at the same site by the einsi command and by PRANK (with codon option)  were considered. Positively-selected residues were mapped on the p55 structure of VacA using PyMol).
The equality of means for phylogenetic profiling between East Asian and European strains was tested by Kruskal-Wallis one-way analysis of variance by ranks, a non-parametric method for testing equality of population medians among groups. The tests were conducted using the R statistics package .
The accession numbers of the H. pylori genome sequences reported in this paper are: F16 [DDBJ:AP011940.1 http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry2.pl?database=ver_ddbj&query=AP011940.1], F30 [DDBJ:AP011941.1 http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry2.pl?database=ver_ddbj&query=AP011941.1, DDBJ:AP011942.1 http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry2.pl?database=ver_ddbj&query=AP011942.1], F32 [DDBJ:AP011943.1 http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry2.pl?database=ver_ddbj&query=AP011943.1, DDBJ:AP011944.1 http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry2.pl?database=ver_ddbj&query=AP011944.1] and F57 [DDBJ:AP011945.1 http://getentry.ddbj.nig.ac.jp/cgi-bin/get_entry2.pl?database=ver_ddbj&query=AP011945.1].
Current position of MK: Institute of Biogeosciences, Japan Agency for Marine-Earth Science and Technology, Yokosuka, Kanagawa, 237-0061, Japan
hpEurope and hpAfrica1, populations of H. pylori
- hspEAsia and hspAmerind:
sub-populations of hpEastAsia
a sub-population of hpAfrica1
outer membrane protein
open reading frame
Fitzgerald JR, Musser JM: Evolutionary genomics of pathogenic bacteria. Trends Microbiol. 2001, 9: 547-553. 10.1016/S0966-842X(01)02228-4.
Alm RA, Ling LS, Moir DT, King BL, Brown ED, Doig PC, Smith DR, Noonan B, Guild BC, deJonge BL, Carmel G, Tummino PJ, Caruso A, Uria-Nickelsen M, Mills DM, Ives C, Gibson R, Merberg D, Mills SD, Jiang Q, Taylor DE, Vovis GF, Trust TJ: Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori. Nature. 1999, 397: 176-180. 10.1038/16495.
Mobley HLT, Mendz GL, Hazell SL: Helicobacter pylori: physiology and genetics. 2001, Amer Society for Microbiology
Yamaoka Y: Helicobacter pylori: molecular genetics and cellular biology. 2008, Caister Academic Pr
Honda S, Fujioka T, Tokieda M, Satoh R, Nishizono A, Nasu M: Development of Helicobacter pylori-induced gastric carcinoma in Mongolian gerbils. Cancer Res. 1998, 58: 4255-4259.
Watanabe T, Tada M, Nagai H, Sasaki S, Nakao M: Helicobacter pylori infection induces gastric cancer in mongolian gerbils. Gastroenterology. 1998, 115: 642-648. 10.1016/S0016-5085(98)70143-X.
Fukase K, Kato M, Kikuchi S, Inoue K, Uemura N, Okamoto S, Terao S, Amagai K, Hayashi S, Asaka M: Effect of eradication of Helicobacter pylori on incidence of metachronous gastric carcinoma after endoscopic resection of early gastric cancer: an open-label, randomised controlled trial. Lancet. 2008, 372: 392-397. 10.1016/S0140-6736(08)61159-9.
Kraft C, Suerbaum S: Mutation and recombination in Helicobacter pylori: mechanisms and role in generating strain diversity. Int J Med Microbiol. 2005, 295: 299-305. 10.1016/j.ijmm.2005.06.002.
Falush D, Wirth T, Linz B, Pritchard JK, Stephens M, Kidd M, Blaser MJ, Graham DY, Vacher S, Perez-Perez GI, Yamaoka Y, Megraud F, Otto K, Reichard U, Katzowitsch E, Wang X, Achtman M, Suerbaum S: Traces of human migrations in Helicobacter pylori populations. Science. 2003, 299: 1582-1585. 10.1126/science.1080857.
Moodley Y, Linz B, Yamaoka Y, Windsor HM, Breurec S, Wu JY, Maady A, Bernhoft S, Thiberge JM, Phuanukoonnon S, Jobb G, Siba P, Graham DY, Marshall BJ, Achtman M: The peopling of the Pacific from a bacterial perspective. Science. 2009, 323: 527-530. 10.1126/science.1166083.
Higashi H, Tsutsumi R, Fujita A, Yamazaki S, Asaka M, Azuma T, Hatakeyama M: Biological activity of the Helicobacter pylori virulence factor CagA is determined by variation in the tyrosine phosphorylation sites. Proc Natl Acad Sci USA. 2002, 99: 14428-14433. 10.1073/pnas.222375399.
Satomi S, Yamakawa A, Matsunaga S, Masaki R, Inagaki T, Okuda T, Suto H, Ito Y, Yamazaki Y, Kuriyama M, Keida Y, Kutsumi H, Azuma T: Relationship between the diversity of the cagA gene of Helicobacter pylori and gastric cancer in Okinawa, Japan. J Gastroenterol. 2006, 41: 668-673. 10.1007/s00535-006-1838-6.
Pride DT, Meinersmann RJ, Blaser MJ: Allelic Variation within Helicobacter pylori babA and babB. Infect Immun. 2001, 69: 1160-1171. 10.1128/IAI.69.2.1160-1171.2001.
Ghose C, Perez-Perez GI, Dominguez-Bello MG, Pride DT, Bravi CM, Blaser MJ: East Asian genotypes of Helicobacter pylori strains in Amerindians provide evidence for its ancient human carriage. Proc Natl Acad Sci USA. 2002, 99: 15107-15111. 10.1073/pnas.242574599.
Salaün L, Saunders NJ: Population-associated differences between the phase variable LPS biosynthetic genes of Helicobacter pylori. BMC Microbiol. 2006, 6: 79-10.1186/1471-2180-6-79.
Ogura M, Perez JC, Mittl PRE, Lee HK, Dailide G, Tan S, Ito Y, Secka O, Dailidiene D, Putty K: Helicobacter pylori evolution: lineage-specific adaptations in homologs of eukaryotic Sel1-like genes. PLoS Comput Biol. 2007, 3: e151-10.1371/journal.pcbi.0030151.
Oleastro M, Cordeiro R, Menard A, Yamaoka Y, Queiroz D, Megraud F, Monteiro L: Allelic diversity and phylogeny of homB, a novel co-virulence marker of Helicobacter pylori. BMC Microbiol. 2009, 9: 248-10.1186/1471-2180-9-248.
H. pylori MLST database. [http://pubmlst.org/helicobacter/]
Linz B, Balloux F, Moodley Y, Manica A, Liu H, Roumagnac P, Falush D, Stamer C, Prugnolle F, van der Merwe SW, Yamaoka Y, Graham DY, Perez-Trallero E, Wadstrom T, Suerbaum S, Achtman M: An African origin for the intimate association between humans and Helicobacter pylori. Nature. 2007, 445: 915-918. 10.1038/nature05562.
Jolley KA, Chan MS, Maiden MC: mlstdbNet - distributed multi-locus sequence typing (MLST) databases. BMC Bioinformatics. 2004, 5: 86-10.1186/1471-2105-5-86.
Kersulyte D, Kalia A, Gilman RH, Mendez M, Herrera P, Cabrera L, Velapatiño B, Balqui J, Paredes Puente de la Vega F, Rodriguez Ulloa CA, Cok J, Hooper CC, Dailide G, Tamma S, Berg DE: Helicobacter pylori from Peruvian amerindians: traces of human migrations in strains from remote Amazon, and genome sequence of an Amerind strain. PLoS One. 2010, 5: e15076-10.1371/journal.pone.0015076.
Mane SP, Dominguez-Bello MG, Blaser MJ, Sobral BW, Hontecillas R, Skoneczka J, Mohapatra SK, Crasta OR, Evans C, Modise T, Shallom S, Shukla M, Varon C, Megraud F, Maldonado-Contreras AL, Williams KP, Bassaganya-Riera J: Host-interactive genes in Amerindian Helicobacter pylori diverge from their Old World homologs and mediate inflammatory responses. J Bacteriol. 2010, 192: 3078-3092. 10.1128/JB.00063-10.
Uchiyama I: Multiple genome alignment for identifying the core structure among moderately related microbial genomes. BMC Genomics. 2008, 9: 515-10.1186/1471-2164-9-515.
Uchiyama I: Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes. Nucleic Acids Res. 2006, 34: 647-658. 10.1093/nar/gkj448.
Furuta Y, Kawai M, Yahara K, Takahashi N, Handa N, Tsuru T, Oshima K, Yoshida M, Azuma T, Hattori M, Uchiyama I, Kobayashi I: Birth and death of genes linked to chromosomal inversion. Proc Natl Acad Sci USA. 2011, 108: 1501-1506. 10.1073/pnas.1012579108.
Yamaoka Y, Alm RA: Helicobacter pylori Outer Membrane Proteins. Helicobacter pylori Molecular Genetics and Cellular Biology. Edited by: Yamaoka Y. 2008, Norfolk, UK: Caister Academic Press, 37-60.
Alm RA, Bina J, Andrews BM, Doig P, Hancock RE, Trust TJ: Comparative genomics of Helicobacter pylori: analysis of the outer membrane protein families. Infect Immun. 2000, 68: 4155-4168. 10.1128/IAI.68.7.4155-4168.2000.
Tomb JF, White O, Kerlavage AR, Clayton RA, Sutton GG, Fleischmann RD, Ketchum KA, Klenk HP, Gill S, Dougherty BA, Nelson K, Quackenbush J, Zhou L, Kirkness EF, Peterson S, Loftus B, Richardson D, Dodson R, Khalak HG, Glodek A, McKenney K, Fitzegerald LM, Lee N, Adams MD, Hickey EK, Berg DE, Gocayne JD, Utterback TR, Peterson JD, Kelley JM, et al: The complete genome sequence of the gastric pathogen Helicobacter pylori. Nature. 1997, 388: 539-547. 10.1038/41483.
Schwarz G, Mendel RR, Ribbe MW: Molybdenum cofactors, enzymes and pathways. Nature. 2009, 460: 839-847. 10.1038/nature08302.
Oh JD, Kling-Backhed H, Giannakis M, Xu J, Fulton RS, Fulton LA, Cordum HS, Wang C, Elliott G, Edwards J, Mardis ER, Engstrand LG, Gordon JI: The complete genome sequence of a chronic atrophic gastritis Helicobacter pylori strain: evolution during disease progression. Proc Natl Acad Sci USA. 2006, 103: 9999-10004. 10.1073/pnas.0603784103.
Gressmann H, Linz B, Ghai R, Pleissner KP, Schlapbach R, Yamaoka Y, Kraft C, Suerbaum S, Meyer TF, Achtman M: Gain and loss of multiple genes during the evolution of Helicobacter pylori. PLoS Genet. 2005, 1: e43-10.1371/journal.pgen.0010043.
Erickson RP: Autosomal recessive diseases among the Athabaskans of the southwestern United States: recent advances and implications for the future. Am J Med Genet A. 2009, 149A: 2602-2611. 10.1002/ajmg.a.33052.
Wolfe AJ: The acetate switch. Microbiol Mol Biol Rev. 2005, 69: 12-10.1128/MMBR.69.1.12-50.2005.
Doig P, de Jonge BL, Alm RA, Brown ED, Uria-Nickelsen M, Noonan B, Mills SD, Tummino P, Carmel G, Guild BC, Moir DT, Vovis GF, Trust TJ: Helicobacter pylori physiology predicted from genomic comparison of two strains. Microbiol Mol Biol Rev. 1999, 63: 675-707.
Kratzer R, Wilson DK, Nidetzky B: Catalytic mechanism and substrate selectivity of aldo-keto reductases: insights from structure-function studies of Candida tenuis xylose reductase. IUBMB Life. 2006, 58: 499-507. 10.1080/15216540600818143.
Eppinger M, Baar C, Linz B, Raddatz G, Lanz C, Keller H, Morelli G, Gressmann H, Achtman M, Schuster SC: Who ate whom? Adaptive Helicobacter genomic changes that accompanied a host jump from early humans to large felines. PLoS Genet. 2006, 2: e120-10.1371/journal.pgen.0020120.
Dundon WG, Marshall DG, Moráin CA, Smyth CJ: A novel tRNA-associated locus (trl) from Helicobacter pylori is co-transcribed with tRNA(Gly) and reveals genetic diversity. Microbiology. 1999, 145 (Pt 6): 1289-1298.
Bocs S, Danchin A, Medigue C: Re-annotation of genome microbial coding-sequences: finding new genes and inaccurately annotated genes. BMC Bioinformatics. 2002, 3: 5-10.1186/1471-2105-3-5.
Chase JW, Rabin BA, Murphy JB, Stone KL, Williams KR: Escherichia coli exonuclease VII. Cloning and sequencing of the gene encoding the large subunit (xseA). J Biol Chem. 1986, 261: 14929-14935.
Chase JW, Richardson CC: Escherichia coli mutants deficient in exonuclease VII. J Bacteriol. 1977, 129: 934-947.
Burdett V, Baitinger C, Viswanathan M, Lovett ST, Modrich P: In vivo requirement for RecJ, ExoVII, ExoI, and ExoX in methyl-directed mismatch repair. Proc Natl Acad Sci USA. 2001, 98: 6765-6770. 10.1073/pnas.121183298.
Fassbinder F, van Vliet AH, Gimmel V, Kusters JG, Kist M, Bereswill S: Identification of iron-regulated genes of Helicobacter pylori by a modified fur titration assay (FURTA-Hp). FEMS Microbiol Lett. 2000, 184: 225-229. 10.1111/j.1574-6968.2000.tb09018.x.
Stoof J, Belzer C, van Vliet A: Metal Metabolism and Transport in Helicobacter pylori. Helicobacter pylori: molecular genetics and cellular biology. 2008, 165-177.
Peck B, Ortkamp M, Diehl KD, Hundt E, Knapp B: Conservation, localization and expression of HopZ, a protein involved in adhesion of Helicobacter pylori. Nucleic Acids Res. 1999, 27: 3325-3333. 10.1093/nar/27.16.3325.
Cao P, Lee KJ, Blaser MJ, Cover TL: Analysis of hopQ alleles in East Asian and Western strains of Helicobacter pylori. FEMS Microbiol Lett. 2005, 251: 37-43. 10.1016/j.femsle.2005.07.023.
Chalk PA, Roberts AD, Blows WM: Metabolism of pyruvate and glucose by intact cells of Helicobacter pylori studied by 13C NMR spectroscopy. Microbiology. 1994, 140 (Pt 8): 2085-2092.
Fujitani Y, Yamamoto K, Kobayashi I: Dependence of frequency of homologous recombination on the homology length. Genetics. 1995, 140: 797-809.
Kersulyte D, Lee W, Subramaniam D, Anant S, Herrera P, Cabrera L, Balqui J, Barabas O, Kalia A, Gilman RH, Berg DE: Helicobacter Pylori's plasticity zones are novel transposable elements. PLoS One. 2009, 4: e6859-10.1371/journal.pone.0006859.
Fischer W, Windhager L, Rohrer S, Zeiller M, Karnholz A, Hoffmann R, Zimmer R, Haas R: Strain-specific genes of Helicobacter pylori: genome evolution driven by a novel type IV secretion system and genomic island transfer. Nucleic Acids Res. 2010, 38: 6089-6101. 10.1093/nar/gkq378.
Ilyina TV, Gorbalenya AE, Koonin EV: Organization and evolution of bacterial and bacteriophage primase-helicase systems. J Mol Evol. 1992, 34: 351-357. 10.1007/BF00160243.
Thiberge JM, Boursaux-Eude C, Lehours P, Dillies MA, Creno S, Coppee JY, Rouy Z, Lajus A, Ma L, Burucoa C, Ruskone-Foumestraux A, Courillon-Mallet A, De Reuse H, Boneca IG, Lamarque D, Megraud F, Delchier JC, Medigue C, Bouchier C, Labigne A, Raymond J: From array-based hybridization of Helicobacter pylori isolates to the complete genome sequence of an isolate associated with MALT lymphoma. BMC Genomics. 2010, 11: 368-10.1186/1471-2164-11-368.
Hofler C, Fischer W, Hofreuter D, Haas R: Cryptic plasmids in Helicobacter pylori: putative functions in conjugative transfer and microcin production. Int J Med Microbiol. 2004, 294: 141-148. 10.1016/j.ijmm.2004.06.021.
Hosaka Y, Okamoto R, Irinoda K, Kaieda S, Koizumi W, Saigenji K, Inoue M: Characterization of pKU701, a 2.5-kb plasmid, in a Japanese Helicobacter pylori isolate. Plasmid. 2002, 47: 193-200. 10.1016/S0147-619X(02)00003-3.
Song JY, Choi SH, Byun EY, Lee SG, Park YH, Park SG, Lee SK, Kim KM, Park JU, Kang HL, Baik SC, Lee WK, Cho MJ, Youn HS, Ko GH, Bae DW, Rhee KH: Characterization of a small cryptic plasmid, pHP51, from a Korean isolate of strain 51 of Helicobacter pylori. Plasmid. 2003, 50: 145-151. 10.1016/S0147-619X(03)00059-3.
Hofreuter D, Haas R: Characterization of two cryptic Helicobacter pylori plasmids: a putative source for horizontal gene transfer and gene shuffling. J Bacteriol. 2002, 184: 2755-2766. 10.1128/JB.184.10.2755-2766.2002.
Baltrus DA, Amieva MR, Covacci A, Lowe TM, Merrell DS, Ottemann KM, Stein M, Salama NR, Guillemin K: The complete genome sequence of Helicobacter pylori strain G27. J Bacteriol. 2009, 191: 447-448. 10.1128/JB.01416-08.
Farnbacher M, Jahns T, Willrodt D, Daniel R, Haas R, Goesmann A, Kurtz S, Rieder G: Sequencing, annotation and comparative genome analysis of the gerbil-adapted Helicobacter pylori strain B8. BMC Genomics. 2010, 11: 335-10.1186/1471-2164-11-335.
Nesic D, Miller MC, Quinkert ZT, Stein M, Chait BT, Stebbins CE: Helicobacter pylori CagA inhibits PAR1-MARK family kinases by mimicking host substrates. Nat Struct Mol Biol. 2010, 17: 130-132. 10.1038/nsmb.1705.
Zhang J, Nielsen R, Yang Z: Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 2005, 22: 2472-2479. 10.1093/molbev/msi237.
Gangwer KA, Mushrush DJ, Stauff DL, Spiller B, McClain MS, Cover TL, Lacy DB: Crystal structure of the Helicobacter pylori vacuolating toxin p55 domain. Proc Natl Acad Sci USA. 2007, 104: 16293-16298. 10.1073/pnas.0707447104.
Wang HJ, Wang WC: Expression and binding analysis of GST-VacA fusions reveals that the C-terminal approximately 100-residue segment of exotoxin is crucial for binding in HeLa cells. Biochem Biophys Res Commun. 2000, 278: 449-454. 10.1006/bbrc.2000.3820.
Seif E, Hallberg BM: RNA-protein mutually induced fit: structure of Escherichia coli isopentenyl-tRNA transferase in complex with tRNA(Phe). J Biol Chem. 2009, 284: 6600-6604.
Kaminska KH, Baraniak U, Boniecki M, Nowaczyk K, Czerwoniec A, Bujnicki JM: Structural bioinformatics analysis of enzymes involved in the biosynthesis pathway of the hypermodified nucleoside ms(2)io(6)A37 in tRNA. Proteins. 2008, 70: 1-18.
Cover TL, Blaser MJ: Purification and characterization of the vacuolating toxin from Helicobacter pylori. J Biol Chem. 1992, 267: 10570-10575.
Jang JY, Yoon HJ, Yoon JY, Kim HS, Lee SJ, Kim KH, Kim dJ, Jang S, Han BG, Lee BI, Suh SW: Crystal structure of the TNF-alpha-Inducing protein (Tipalpha) from Helicobacter pylori: Insights into Its DNA-binding activity. J Mol Biol. 2009, 392: 191-197. 10.1016/j.jmb.2009.07.010.
Chung C, Olivares A, Torres E, Yilmaz O, Cohen H, Perez-Perez G: Diversity of VacA intermediate region among Helicobacter pylori strains from several regions of the world. J Clin Microbiol. 2010, 48: 690-696. 10.1128/JCM.01815-09.
Testerman T, McGee D, Mobley H: Adherence and colonization. Helicobacter pylori: physiology and genetics. 2001, 381-417.
Carlsohn E, Nystrom J, Bolin I, Nilsson CL, Svennerholm AM: HpaA is essential for Helicobacter pylori colonization in mice. Infect Immun. 2006, 74: 920-926. 10.1128/IAI.74.2.920-926.2006.
Yamaoka Y, Kwon DH, Graham DY: A M(r) 34,000 proinflammatory outer membrane protein (oipA) of Helicobacter pylori. Proc Natl Acad Sci USA. 2000, 97: 7533-7538. 10.1073/pnas.130079797.
Aspholm-Hurtig M, Dailide G, Lahmann M, Kalia A, Ilver D, Roche N, Vikstrom S, Sjostrom R, Linden S, Backstrom A, Lundberg C, Arnqvist A, Mahdavi J, Nilsson UJ, Velapatino B, Gilman RH, Gerhard M, Alarcon T, Lopez-Brea M, Nakazawa T, Fox JG, Correa P, Dominguez-Bello MG, Perez-Perez GI, Blaser MJ, Normark S, Carlstedt I, Oscarson S, Teneberg S, Berg DE, et al: Functional adaptation of BabA, the H. pylori ABO blood group antigen binding adhesin. Science. 2004, 305: 519-522. 10.1126/science.1098801.
Ilver D, Arnqvist A, Ogren J, Frick IM, Kersulyte D, Incecik ET, Berg DE, Covacci A, Engstrand L, Boren T: Helicobacter pylori adhesin binding fucosylated histo-blood group antigens revealed by retagging. Science. 1998, 279: 373-377. 10.1126/science.279.5349.373.
Odenbreit S, Till M, Hofreuter D, Faller G, Haas R: Genetic and functional characterization of the alpAB gene locus essential for the adhesion of Helicobacter pylori to human gastric tissue. Mol Microbiol. 1999, 31: 1537-1548. 10.1046/j.1365-2958.1999.01300.x.
Lu H, Wu JY, Beswick EJ, Ohno T, Odenbreit S, Haas R, Reyes VE, Kita M, Graham DY, Yamaoka Y: Functional and intracellular signaling differences associated with the Helicobacter pylori AlpAB adhesin from Western and East Asian strains. J Biol Chem. 2007, 282: 6242-6254.
Moran AP, Trent MS: Helicobacter pylori Lipopolysaccharides and Lewis Antigens. Helicobacter pylori: molecular genetics and cellular biology. 2008, Caister Academic Pr, 7-
Rasko DA, Wang G, Palcic MM, Taylor DE: Cloning and characterization of the alpha(1,3/4) fucosyltransferase of Helicobacter pylori. J Biol Chem. 2000, 275: 4988-4994. 10.1074/jbc.275.7.4988.
Bergman M, Del Prete G, van Kooyk Y, Appelmelk B: Helicobacter pylori phase variation, immune modulation and gastric autoimmunity. Nat Rev Microbiol. 2006, 4: 151-159. 10.1038/nrmicro1344.
Nilsson C, Skoglund A, Moran AP, Annuk H, Engstrand L, Normark S: Lipopolysaccharide diversity evolving in Helicobacter pylori communities through genetic modifications in fucosyltransferases. PLoS One. 2008, 3: e3811-10.1371/journal.pone.0003811.
Skoglund A, Bäckhed HK, Nilsson C, Björkholm B, Normark S, Engstrand L: A changing gastric environment leads to adaptation of lipopolysaccharide variants in Helicobacter pylori populations during colonization. PLoS One. 2009, 4: e5885-10.1371/journal.pone.0005885.
Driessen AJ, Nouwen N: Protein translocation across the bacterial cytoplasmic membrane. Annu Rev Biochem. 2008, 77: 643-667. 10.1146/annurev.biochem.77.061606.160747.
Kato Y, Nishiyama K, Tokuda H: Depletion of SecDF-YajC causes a decrease in the level of SecG: implication for their functional interaction. FEBS Lett. 2003, 550: 114-118. 10.1016/S0014-5793(03)00847-0.
Smeets LC, Bijlsma JJ, Boomkens SY, Vandenbroucke-Grauls CM, Kusters JG: comH, a novel gene essential for natural transformation of Helicobacter pylori. J Bacteriol. 2000, 182: 3948-3954. 10.1128/JB.182.14.3948-3954.2000.
Fath MJ, Mahanty HK, Kolter R: Characterization of a purF operon mutation which affects colicin V production. J Bacteriol. 1989, 171: 3158-3161.
Rust M, Schweinitzer T, Josenhans C: Helicobacter Flagella, Motility and Chemotaxis. Helicobacter pylori: molecular genetics and cellular biology. 2008, 61-
Ryan KA, Karim N, Worku M, Penn CW, O'Toole PW: Helicobacter pylori flagellar hook-filament transition is controlled by a FliK functional homolog encoded by the gene HP0906. J Bacteriol. 2005, 187: 5742-5750. 10.1128/JB.187.16.5742-5750.2005.
Logan SM: Flagellar glycosylation - a new component of the motility repertoire?. Microbiology. 2006, 152: 1249-1262. 10.1099/mic.0.28735-0.
Kelly DJ, Hughes NJ, Poole RK: Microaerobic physiology: aerobic respiration, anaerobic respiration, and carbon dioxide metabolism. Helicobacter pylori: physiology and genetics. 2001, 113-124.
Kohanski MA, Dwyer DJ, Collins JJ: How antibiotics kill bacteria: from targets to networks. Nat Rev Microbiol. 2010, 8: 423-435. 10.1038/nrmicro2333.
Ingelman M, Ramaswamy S, Nivière V, Fontecave M, Eklund H: Crystal structure of NAD(P)H:flavin oxidoreductase from Escherichia coli. Biochemistry. 1999, 38: 7040-7049. 10.1021/bi982849m.
Kwon DH, El-Zaatari FA, Kato M, Osato MS, Reddy R, Yamaoka Y, Graham DY: Analysis of rdxA and involvement of additional genes encoding NAD(P)H flavin oxidoreductase (FrxA) and ferredoxin-like protein (FdxB) in metronidazole resistance of Helicobacter pylori. Antimicrob Agents Chemother. 2000, 44: 2133-2142. 10.1128/AAC.44.8.2133-2142.2000.
Watanabe S, Matsumi R, Arai T, Atomi H, Imanaka T, Miki K: Crystal structures of [NiFe] hydrogenase maturation proteins HypC, HypD, and HypE: insights into cyanation reaction by thiol redox signaling. Mol Cell. 2007, 27: 29-40. 10.1016/j.molcel.2007.05.039.
Benoit S, Mehta N, Wang G, Gatlin M, Maier RJ: Requirement of hydD, hydE, hypC and hypE genes for hydrogenase activity in Helicobacter pylori. Microb Pathog. 2004, 36: 153-157. 10.1016/j.micpath.2003.11.001.
Hazell S, Harris A, Trend M: Evasion of the toxic effects of oxygen. Helicobacter pylori: physiology and genetics. Edited by: Mobley H, Mendz G, Hazell S. 2001, ASM Press, 167-175.
Giró M, Carrillo N, Krapp AR: Glucose-6-phosphate dehydrogenase and ferredoxin-NADP(H) reductase contribute to damage repair during the soxRS response of Escherichia coli. Microbiology. 2006, 152: 1119-1128. 10.1099/mic.0.28612-0.
Urbonavicius J, Qian Q, Durand JM, Hagervall TG, Bjork GR: Improvement of reading frame maintenance is a common function for several tRNA modifications. Embo J. 2001, 20: 4863-4873. 10.1093/emboj/20.17.4863.
Nakanishi K, Bonnefond L, Kimura S, Suzuki T, Ishitani R, Nureki O: Structural basis for translational fidelity ensured by transfer RNA lysidine synthetase. Nature. 2009, 461: 1144-1148. 10.1038/nature08474.
Suzuki T, Miyauchi K: Discovery and characterization of tRNAIle lysidine synthetase (TilS). FEBS Lett. 2010, 584: 272-277. 10.1016/j.febslet.2009.11.085.
Cai J, Han C, Hu T, Zhang J, Wu D, Wang F, Liu Y, Ding J, Chen K, Yue J, Shen X, Jiang H: Peptide deformylase is a potential target for anti-Helicobacter pylori drugs: reverse docking, enzymatic assay, and X-ray crystallography validation. Protein Sci. 2006, 15: 2071-2081. 10.1110/ps.062238406.
Demirci H, Gregory ST, Dahlberg AE, Jogl G: Recognition of ribosomal protein L11 by the protein trimethyltransferase PrmA. Embo J. 2007, 26: 567-577. 10.1038/sj.emboj.7601508.
Amundsen SK, Fero J, Hansen LM, Cromie GA, Solnick JV, Smith GR, Salama NR: Helicobacter pylori AddAB helicase-nuclease and RecA promote recombination-related DNA repair and survival during stomach colonization. Mol Microbiol. 2008, 69: 994-1007. 10.1111/j.1365-2958.2008.06336.x.
Sourice S, Biaudet V, El Karoui M, Ehrlich SD, Gruss A: Identification of the Chi site of Haemophilus influenzae as several sequences related to the Escherichia coli Chi site. Mol Microbiol. 1998, 27: 1021-1029. 10.1046/j.1365-2958.1998.00749.x.
Handa N, Ohashi S, Kusano K, Kobayashi I: Chi-star, a chi-related 11-mer sequence partially active in an E. coli recC1004 strain. Genes Cells. 1997, 2: 525-536. 10.1046/j.1365-2443.1997.1410339.x.
Tadokoro T, Kanaya S: Ribonuclease H: molecular diversities, substrate binding domains, and catalytic mechanism of the prokaryotic enzymes. FEBS J. 2009, 276: 1482-1493. 10.1111/j.1742-4658.2009.06907.x.
Kogoma T: Stable DNA replication: interplay between DNA replication, homologous recombination, and transcription. Microbiol Mol Biol Rev. 1997, 61: 212-238.
Adams DW, Errington J: Bacterial cell division: assembly, maintenance and disassembly of the Z ring. Nat Rev Microbiol. 2009, 7: 642-653. 10.1038/nrmicro2198.
Lock RL, Harry EJ: Cell-division inhibitors: new insights for future antibiotics. Nat Rev Drug Discov. 2008, 7: 324-338. 10.1038/nrd2510.
Moran AP: Relevance of fucosylation and Lewis antigen expression in the bacterial gastroduodenal pathogen Helicobacter pylori. Carbohydr Res. 2008, 343: 1952-1965. 10.1016/j.carres.2007.12.012.
Broadberry RE, Lin-Chu M: The Lewis blood group system among Chinese in Taiwan. Hum Hered. 1991, 41: 290-294. 10.1159/000154015.
Anstee DJ: The relationship between blood groups and disease. Blood. 2010, 115: 4635-4643. 10.1182/blood-2010-01-261859.
Rajagopalan KV: Molybdenum: an essential trace element in human nutrition. Annual review of nutrition. 1988, 8: 401-427. 10.1146/annurev.nu.08.070188.002153.
Ezraty B, Bos J, Barras F, Aussel L: Methionine sulfoxide reduction and assimilation in Escherichia coli: new role for the biotin sulfoxide reductase BisC. J Bacteriol. 2005, 187: 231-237. 10.1128/JB.187.1.231-237.2005.
Alamuri P, Maier RJ: Methionine sulphoxide reductase is an important antioxidant enzyme in the gastric pathogen Helicobacter pylori. Molecular microbiology. 2004, 53: 1397-1406. 10.1111/j.1365-2958.2004.04190.x.
Wang G, Alamuri P, Maier RJ: The diverse antioxidant systems of Helicobacter pylori. Mol Microbiol. 2006, 61: 847-860. 10.1111/j.1365-2958.2006.05302.x.
Alamuri P, Maier RJ: Methionine sulfoxide reductase in Helicobacter pylori: interaction with methionine-rich proteins and stress-induced expression. J Bacteriol. 2006, 188: 5839-5850. 10.1128/JB.00430-06.
Sachs G, Weeks D, Melchers K, Scott D: The gastric biology of Helicobacter pylori. Helicobacter pylori: molecular genetics and cellular biology. 2008, 137-
McColl KE: Helicobacter pylori and acid secretion: where are we now?. Eur J Gastroenterol Hepatol. 1997, 9: 333-335.
El-Mansi M, Cozzone AJ, Shiloach J, Eikmanns BJ: Control of carbon flux through enzymes of central and intermediary metabolism during growth of Escherichia coli on acetate. Curr Opin Microbiol. 2006, 9: 173-179. 10.1016/j.mib.2006.02.002.
Moura GR, Carreto LC, Santos MA: Genetic code ambiguity: an unexpected source of proteome innovation and phenotypic diversity. Curr Opin Microbiol. 2009, 12: 631-637. 10.1016/j.mib.2009.09.004.
Denamur E, Lecointre G, Darlu P, Tenaillon O, Acquaviva C, Sayada C, Sunjevaric I, Rothstein R, Elion J, Taddei F, Radman M, Matic I: Evolutionary implications of the frequent horizontal transfer of mismatch repair genes. Cell. 2000, 103: 711-721. 10.1016/S0092-8674(00)00175-6.
Jenks PJ, Edwards DI: Metronidazole resistance in Helicobacter pylori. Int J Antimicrob Agents. 2002, 19: 1-7. 10.1016/S0924-8579(01)00468-X.
Ito Y, Azuma T, Ito S, Suto H, Miyaji H, Yamazaki Y, Kohli Y, Kuriyama M: Full-length sequence analysis of the vacA gene from cytotoxic and noncytotoxic Helicobacter pylori. J Infect Dis. 1998, 178: 1391-1398. 10.1086/314435.
Azuma T, Yamakawa A, Yamazaki S, Ohtani M, Ito Y, Muramatsu A, Suto H, Yamazaki Y, Keida Y, Higashi H, Hatakeyama M: Distinct diversity of the cag pathogenicity island among Helicobacter pylori strains in Japan. J Clin Microbiol. 2004, 42: 2508-2517. 10.1128/JCM.42.6.2508-2517.2004.
National Center for Biotechnology Information. [http://www.ncbi.nlm.nih.gov]
Besemer J, Lomsadze A, Borodovsky M: GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res. 2001, 29: 2607-2618. 10.1093/nar/29.12.2607.
Delcher AL, Bratke KA, Powers EC, Salzberg SL: Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics. 2007, 23: 673-679. 10.1093/bioinformatics/btm009.
Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997, 25: 3389-3402. 10.1093/nar/25.17.3389.
Lowe TM, Eddy SR: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997, 25: 955-964. 10.1093/nar/25.5.955.
Katoh K, Asimenos G, Toh H: Multiple alignment of DNA sequences with MAFFT. Methods Mol Biol. 2009, 537: 39-64. 10.1007/978-1-59745-251-9_3.
Castresana J: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000, 17: 540-552.
Bryant D, Moulton V: Neighbor-net: an agglomerative method for the construction of phylogenetic networks. Mol Biol Evol. 2004, 21: 255-265.
Huson DH, Bryant D: Application of phylogenetic networks in evolutionary studies. Mol Biol Evol. 2006, 23: 254-267.
Marchler-Bauer A, Panchenko AR, Shoemaker BA, Thiessen PA, Geer LY, Bryant SH: CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. 2002, 30: 281-283. 10.1093/nar/30.1.281.
Cvetkovic A, Menon AL, Thorgersen MP, Scott JW, Poole FL, Jenney FE, Lancaster WA, Praissman JL, Shanmukh S, Vaccaro BJ, Trauger SA, Kalisiak E, Apon JV, Siuzdak G, Yannone SM, Tainer JA, Adams MW: Microbial metalloproteomes are largely uncharacterized. Nature. 2010, 466: 779-782. 10.1038/nature09265.
Vernikos GS, Parkhill J: Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands. Bioinformatics. 2006, 22: 2196-2203. 10.1093/bioinformatics/btl369.
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.
Loytynoja A, Goldman N: An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci USA. 2005, 102: 10557-10562. 10.1073/pnas.0409137102.
R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2010, [http://www.R-project.org/]
Ren S, Higashi H, Lu H, Azuma T, Hatakeyama M: Structural basis and functional consequence of Helicobacter pylori CagA multimerization in cells. J Biol Chem. 2006, 281: 32344-32352. 10.1074/jbc.M606172200.
Devi SH, Taylor TD, Avasthi TS, Kondo S, Suzuki Y, Megraud F, Ahmed N: Genome of Helicobacter pylori strain 908. J Bacteriol. 2010, 192: 6488-6489. 10.1128/JB.01110-10.
Xia Y, Yamaoka Y, Zhu Q, Matha I, Gao X: A comprehensive sequence and disease correlation analyses for the C-terminal region of CagA protein of Helicobacter pylori. PLoS One. 2009, 4: e7736-10.1371/journal.pone.0007736.
van Doorn LJ, Figueiredo C, Rossau R, Jannes G, van Asbroek M, Sousa JC, Carneiro F, Quint WG: Typing of Helicobacter pylori vacA gene and detection of cagA gene by PCR and reverse hybridization. J Clin Microbiol. 1998, 36: 1271-1276.
Rhead JL, Letley DP, Mohammadi M, Hussein N, Mohagheghi MA, Eshagh Hosseini M, Atherton JC: A new Helicobacter pylori vacuolating cytotoxin determinant, the intermediate region, is associated with gastric cancer. Gastroenterology. 2007, 133: 926-936. 10.1053/j.gastro.2007.06.056.
McClain MS, Shaffer CL, Israel DA, Peek RM, Cover TL: Genome sequence analysis of Helicobacter pylori strains associated with gastric ulceration and gastric cancer. BMC Genomics. 2009, 10: 3-10.1186/1471-2164-10-3.
Xie W, Zhou C, Huang RH: Structure of tRNA dimethylallyltransferase: RNA modification through a channel. J Mol Biol. 2007, 367: 872-881. 10.1016/j.jmb.2007.01.048.
Blokesch M, Albracht SP, Matzanke BF, Drapal NM, Jacobi A, Bock A: The complex between hydrogenase-maturation proteins HypC and HypD is an intermediate in the supply of cyanide to the active site iron of [NiFe]-hydrogenases. J Mol Biol. 2004, 344: 155-167. 10.1016/j.jmb.2004.09.040.
Blokesch M, Bock A: Properties of the [NiFe]-hydrogenase maturation protein HypD. FEBS Lett. 2006, 580: 4065-4068. 10.1016/j.febslet.2006.06.045.
YF, TT, NH, NT and IK are grateful to Hitomi Mimuro and Chihiro Sasakawa for introduction to H. pylori experiments. This work was supported by the Institute for Bioinformatics Research and Development, the Japan Science and Technology Agency. I.U. was supported by a Grant-in-Aid for Scientific Research (20310125) from the Japan Society for the Promotion of Science. N. H. was supported by grants from Ministry of Education, Culture, Sports, Science and Technology-Japan (MEXT), by Takeda Foundation, by Sumitomo Foundation, by Kato Memorial Bioscience Foundation and by Naito Foundation. I.K. was supported by the global COE (Center of Excellence) project of "Genome Information Big Bang" from MEXT, by the Suzuken Memorial Foundation, and by the Urakami Food and Food Culture Foundation. M.H. was supported by Grants-in-Aid for Scientific Research on Priority Areas "Comprehensive Genomics" from MEXT.
The authors declare that they have no competing interests.
MK and YF contributed to informatics analysis and wrote the manuscript. YF carried out experimental verification of sequences of molybdenum-related genes and acetate pathway related genes. KY, TT, and IU contributed to informatics analysis. NH and NT contributed to genome DNA preparation. KO and MH contributed to sequencing and assembly. MY and TA provided the strains. I.K. contributed to design, analysis and writing. All the authors discussed the results and commented on the manuscript. All the authors read and approved the final manuscript.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.