Variability and conservation in hepatitis B virus core protein
© Chain and Myers; licensee BioMed Central Ltd. 2005
Received: 02 December 2004
Accepted: 27 May 2005
Published: 27 May 2005
Hepatitis B core protein (HBVc) has been extensively studied from both a structural and immunological point of view, but the evolutionary forces driving sequence variation within core are incompletely understood.
In this study, the observed variation in HBVc protein sequence has been examined in a collection of a large number of HBVc protein sequences from public sequence repositories. An alignment of several hundred sequences was carried out, and used to analyse the distribution of polymorphisms along the HBVc. Polymorphisms were found at 44 out of 185 amino acid positions analysed and were clustered predominantly in those parts of HBVc forming the outer surface and spike on intact capsid. The relationship between HBVc diversity and HBV genotype was examined. The position of variable amino acids along the sequence was examined in terms of the structural constraints of capsid and envelope assembly, and also in terms of immunological recognition by T and B cells.
Over three quarters of amino acids within the HBVc sequence are non-polymorphic, and variation is focused to a few amino acids. Phylogenetic analysis suggests that core protein specific forces constrain its diversity within the context of overall HBV genome evolution. As a consequence, core protein is not a reliable predictor of virus genotype. The structural requirements of capsid assembly are likely to play a major role in limiting diversity. The phylogenetic analysis further suggests that immunological selection does not play a major role in driving HBVc diversity.
The evolutionary pressures that have driven Hepatitis B virus (HBV) variation remain incompletely understood. Using whole HBV genotype sequencing, this variability can usefully be classified into at least eight families (genotypes) with a characteristic geographic distribution (reviewed in ). Alternatively, HBV strains can be classified serologically on the basis of antibody to surface antigen (subtypes). These two classifications broadly correlate, although some subtypes appear in more than one genotype. The extent of genetic diversity reflects the evolutionary history of the virus and the rate of genomic mutation, as well as gene specific selection forces. Several models of HBV evolution have been proposed (reviewed in [2, 3]) but fundamental parameters, such as the rate of interspecies transmission or the rate of nucleotide mutation (the molecular clock) remain unresolved . Nevertheless, it is generally assumed that the emergence of HBV families may reflect adaptation to the genotype of the prevalent human host population .
The clinical course of HBV infection is very variable. Acute infections in adults are usually effectively controlled, but occasionally lead to fulminant hepatitis and death. In a proportion of individuals however, infection leads to chronic viral replication, which can lead to severe liver damage or hepatocellular carcinoma. Host factors including immune status clearly play a major role in determining clinical outcome. For example, perinatal transmission leads to up to 90% chronic carriership, while the figure is less than 10% for adults. However, pathogenicity has also been linked to virus genotype and several different mechanisms have been proposed for this observation [5, 6]. Sequence changes occurring during the course of infection (longitudinal diversity) have also been extensively documented. One common example is the introduction of a stop codon in the precore region which results in downregulation of secretion of a soluble form of HBV core protein (HBVe) whose function remains unclear [7, 8]. Interestingly, the downregulation of HBVe secretion is often associated with the appearance of anti-HBVe antibodies in serum, suggesting the protein itself may induce some form of immunological tolerance [7, 9].
The role of adaptive immunity both in determining the course of HBV infection and in driving HBV evolution is of special interest. Although pre-existing antibody to HBV surface protein (HBVs) (for example in vaccinated individuals) clearly provides strong protection, antibody to this antigen in natural infection is a late event, usually subsequent to effective control for viremia. In contrast, antibody to HBV core (HBVc), although this protein is internal to the virion, occurs early in infection in almost all infected individuals, irrespective of their ability to control viral replication . T helper and cytotoxic responses to several proteins of HBV have also been detected, and the presence of a higher frequency of HBV specific CTL in liver is associated with the ability to control viremia . As might be predicted CTL responses are not limited to structural proteins, but recognise several non-structural viral proteins. Virus specific T cell immune responses are most readily detected in individuals who effectively control viral replication. In chronically infected individuals, these responses are often much more difficult to detect, suggesting that the chronic state is associated with the establishment of some form of immunological tolerance .
HBVc antigen is a small protein, whose three-dimensional structure has been determined by X-ray crystallography , and whose immunogenicity in terms of both antibody and T cell responses has been studied rather extensively in both mouse and man. It thus represents an ideal starting point for studies aimed at relating antigen structure to immune response. Indeed longitudinal studies on small groups of HBV infected individuals have suggested that variation is more common in B cell and T helper cell epitopes, suggesting a possible immune driven escape mechanism . As a basis for further functional exploration we have documented HBVc variation in detail. In this study we have collected several hundred protein sequences of HBVc from public databases, and have re-examined variation in relation to structure, immunogenicity and genotype.
Results and discussion
Accession numbers and genotype for HBV sequences analysed in fig 4
Taken together, this data suggested the forces driving the evolution of core were partially independent of the evolutionary forces driving diversification of overall genotype.
Finally, considerable information has been accumulated on the interaction of antibodies to HBV capsids. As shown in fig 9b the major defined antibody specificities lie on the outside of the capsid structure, particularly at the tip of the spikes and at the junctions between adjacent spikes. These regions do indeed contain the majority of the HBVc sequence variation, although the most variable amino acids themselves have not been identified as known antibody contacts . However, the contribution of anti-HBVc antibody to protection remains unclear, particularly since the capsid in intact virions is presumably largely shielded from antibody by the HBV envelope.
This study makes use of the large number of HBVc sequences now available in public databases to characterise sequence variation in HBVc. One limitation of such an approach is that detailed clinical information associated with infection is not available, and in particular, it is not possible to examine variability in the context of the longitudinal course of an HBV infection. This is likely to be an important factor since mutations are often found to arise late in infection, associated with a variety of clinical outcomes (e.g. [34, 35]). In addition few of the available sequences have been checked for their ability to make competent infectious virus, and some sequences may therefore represent non-functional proteins. However, despite these limitations, the data available does allow some general conclusions.
Overall, HBVc contains a large proportion of invariant amino acids, and a strong over representation of synonymous versus non-synonymous mutations at almost every codon. Both features suggest the presence of strong constraining forces on sequence diversity. Virion assembly is likely to provide one major constraining force . As reported previously (e.g. ) sequence diversity appears to be clustered, and mapped predominantly to the spike and external surface of the capsid. These positions may allow greater flexibility in terms of virion assembly.
One significant consequence of the strong purifying selection is that protein sequence is a poor predictor of genotype for this gene. DNA sequence which reflects predominantly synonymous mutations, is a better discriminator, particularly in resolving genotypes B and C. Longer sequence analysis, however, is clearly necessary to obtain reliable genotyping information.
The putative role of immune selection in driving HBV core diversity is much more unclear. Direct evidence for positive selection, at least using the analysis presented here, identifies only a single amino acid (position 74 at the tip of the viral spike) as showing evidence for positive selection of diversity. Nevertheless, it is clear that several polymorphic positions lie within T or B cell epitopes. Hence, while the overall effect of immune selection on HBV sequence diversity may be small, sequence diversity may have a significant effect on the immune response at an individual level. The data analysis given here will help inform further analysis of the HBV-specific immune response. The combination of T cell and antibody recognition studies with directed mutagenesis of HBVc should determine more precisely the relationship between structure, immunity and pathogenicity.
Human HBVc sequences were retrieved from the NCBI protein sequence database , limiting the search to organism = Hepatitis B virus and searching for core protein. Additional classification into genotypes A-D was done using the text search tool to look for "genotype X". A proportion of hits were verified by manual inspection.
The initial 780 hits were aligned using the EMMA program on EMBOSS (a version of Clustal)(see using the BLOSUM 62 similarity matrix. The alignment was further refined by manual inspection using the sequence editor Bioedit, and very short or badly aligned sequences removed. The frequency of amino acids at each position was determined using the EMBOSS program PROPHECY. The matrix observed was converted into polymorphism frequency by setting a cut-off of 1% frequency at each position.
A phylogenetic tree of the HBVc sequences was created using the Phylip program Kitsch ), which uses a Fitch-Margoliash criterion based on a distance matrix obtained using the Phylip program Protdist. The tree was displayed using Treeview , copied in Adobe Illustrator and color coded according to genotype. Because this method is extremely processor intensive (the best tree is analysed at each iteration) it was not possible to bootstrap. Analyses of the same data were also done using nearest neighbor analysis (using the Phylip program Neighbor) and parsimony using the EMBOSS (loc cit) program EPROTPARS. Although the fine details of the trees varied between methods the overall qualitative conclusion were the same. In order to further validate the conclusions of the phylogeny, 40 sequences (shown in Table 1) chosen manually to cover the major branches of the tree shown in fig 3 were reanalysed using the Nearest Neighbour with bootstrap option (1000 bootstraps) of ClustalW . The consensus tree set was plotted in Treeview and coloured in Illustrator. Similar analysis was carried out on an alignment of the DNA sequences corresponding to each protein sequence.
The analysis of synonymous/nonsynonymous mutations rates was carried out initially using the program DNASP3.0  (using the Nei/ Gojobori algorithm) with a 36 base pair sliding window shifted by 9 nucleotides. A more detailed analysis at individual codons was carried out using the program PAML version 3.14 (using maximum likelihood Bayes Empirical Bayes inference, as described in ). Analysis was carried out using a variety of selection models, but the output represented in fig 5 used model 8 . The models assume that the selection pressure (measured as ω) operating at each codon can fall within a range of different classes. Model 8 assumes a distribution of negative selection values (for all of which ω <1), or a positive selection class with ω = 1.38. This model was found to give the best likelihood estimate. The program then calculates the posterior probability (p value) that each codon within a sequence falls within a particular class.
The crystal structure of HBVc was retrieved as a pdb file from the Brookhaven database, and displayed and coloured using RasMol software (version 184.108.40.206 for Windows). Figures 6 and 7 show the structure of four identical monomers, to illustrate the spikes and their interaction, while figs 8 and 9 show a single monomer for clarity.
Hepatitis B virus
Hepatitis B core
Hepatitis B surface protein
I am grateful for much useful discussion with many colleagues at UCL, in particular to Paul Kellam for help with the bioinformatics, to Ziheng Yang for initial help setting up PAML and to Richard Tedder, Antonio Bertoletti, Mala Maini and Nikolai Naoumov for advice on HBV. I am grateful to Dr. Volker Bruss (University of Goettingen) for his help and advice. I am also very grateful to the staff at the UK HGMP Resource Centre for a lot of help and patience with running the various Bioinformatic programs.
- Kidd-Ljunggren K, Miyakawa Y, Kidd AH: Genetic variability in hepatitis B viruses. J Gen Virol. 2002, 83: 1267-1280.View ArticlePubMedGoogle Scholar
- Robertson BH, Margolis HS: Primate hepatitis B viruses - genetic diversity, geography and evolution. Rev Med Virol. 2002, 12: 133-141. 10.1002/rmv.348.View ArticlePubMedGoogle Scholar
- Bollyky PL, Holmes EC: Reconstructing the complex evolutionary history of hepatitis B virus. J Mol Evol. 1999, 49: 130-141.View ArticlePubMedGoogle Scholar
- Jazayeri M, Basuni AA, Sran N, Gish R, Cooksley G, Locarnini S, Carman WF: HBV core sequence: definition of genotype-specific variability and correlation with geographical origin. J Viral Hepat. 2004, 11: 488-501. 10.1111/j.1365-2893.2004.00534.x.View ArticlePubMedGoogle Scholar
- Jazayeri MS, Dornan ES, Boner W, Fattovich G, Hadziyannis S, Carman WF: Intracellular distribution of hepatitis B virus core protein expressed in vitro depends on the sequence of the isolate and the serologic pattern. J Infect Dis. 2004, 189: 1634-1645. 10.1086/382190.View ArticlePubMedGoogle Scholar
- Torre F, Naoumov NV: Clinical implications of mutations in the hepatitis B virus genome. Eur J Clin Invest. 1998, 28: 604-614. 10.1046/j.1365-2362.1998.00346.x.View ArticlePubMedGoogle Scholar
- Carman WF, Jacyna MR, Hadziyannis S, Karayiannis P, McGarvey MJ, Makris A, Thomas HC: Mutation preventing formation of hepatitis B e antigen in patients with chronic hepatitis B infection. Lancet. 1989, 2: 588-591. 10.1016/S0140-6736(89)90713-7.View ArticlePubMedGoogle Scholar
- Tong SP, Diot C, Gripon P, Li J, Vitvitski L, Trepo C, Guguen-Guillouzo C: In vitro replication competence of a cloned hepatitis B virus variant with a nonsense mutation in the distal pre-C region. Virology. 1991, 181: 733-737. 10.1016/0042-6822(91)90908-T.View ArticlePubMedGoogle Scholar
- Chan HL, Hussain M, Lok AS: Different hepatitis B virus genotypes are associated with different mutations in the core promoter and precore regions during hepatitis B e antigen seroconversion. Hepatology. 1999, 29: 976-984. 10.1002/hep.510290352.View ArticlePubMedGoogle Scholar
- Vanlandschoot P, Cao T, Leroux-Roels G: The nucleocapsid of the hepatitis B virus: a remarkable immunogenic structure. Antiviral Res. 2003, 60: 67-74. 10.1016/j.antiviral.2003.08.011.View ArticlePubMedGoogle Scholar
- Maini MK, Boni C, Ogg GS, King AS, Reignat S, Lee CK, Larrubia JR, Webster GJ, McMichael AJ, Ferrari C, Williams R, Vergani D, Bertoletti A: Direct ex vivo analysis of hepatitis B virus-specific CD8(+) T cells associated with the control of infection. Gastroentorology. 1999, 117: 1386-1396.View ArticleGoogle Scholar
- Kakimi K, Isogawa M, Chung J, Sette A, Chisari FV: Immunogenicity and tolerogenicity of hepatitis B virus structural and nonstructural proteins: implications for immunotherapy of persistent viral infections. J Virol. 2002, 76: 8609-8620. 10.1128/JVI.76.17.8609-8620.2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Wynne SA, Crowther RA, Leslie AG: The crystal structure of the human hepatitis B virus capsid. Mol Cell. 1999, 3: 771-780. 10.1016/S1097-2765(01)80009-5.View ArticlePubMedGoogle Scholar
- Carman WF, Boner W, Fattovich G, Colman K, Dornan ES, Thursz M, Hadziyannis S: Hepatitis B virus core protein mutations are concentrated in B cell epitopes in progressive disease and in T helper cell epitopes during clinical remission. J Infect Dis. 1997, 175: 1093-1100.View ArticlePubMedGoogle Scholar
- Jones DT, Taylor WR, Thornton JM: The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci. 1992, 8: 275-282.PubMedGoogle Scholar
- Sugauchi F, Orito E, Ichida T, Kato H, Sakugawa H, Kakumu S, Ishida T, Chutaputti A, Lai CL, Ueda R, Miyakawa Y, Mizokami M: Hepatitis B virus of genotype B with or without recombination with genotype C over the precore region plus the core gene. J Virol. 2002, 76: 5985-5992. 10.1128/JVI.76.12.5985-5992.2002.PubMed CentralView ArticlePubMedGoogle Scholar
- Bollyky PL, Rambaut A, Harvey PH, Holmes EC: Recombination between sequences of hepatitis B virus from different genotypes. J Mol Evol. 1996, 42: 97-102.View ArticlePubMedGoogle Scholar
- Yang Z, Nielsen R: Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol. 2002, 19: 908-917.View ArticlePubMedGoogle Scholar
- Wong WS, Yang Z, Goldman N, Nielsen R: Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics. 2004, 168: 1041-1051. 10.1534/genetics.104.031153.PubMed CentralView ArticlePubMedGoogle Scholar
- Yang Z, Wong WS, Nielsen R: Bayes Empirical Bayes Inference of Amino Acid Sites under Positive Selection. Mol Biol Evol. 2005Google Scholar
- Suzuki Y, Gojobori T: A method for detecting positive selection at single amino acid sites. Mol Biol Evol. 1999, 16: 1315-1328.View ArticlePubMedGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13: 555-556.PubMedGoogle Scholar
- Ponsel D, Bruss V: Mapping of amino acid side chains on the surface of hepatitis B virus capsids required for envelopment and virion formation. J Virol. 2003, 77: 416-422. 10.1128/JVI.77.1.416-422.2003.PubMed CentralView ArticlePubMedGoogle Scholar
- Maini MK, Bertoletti A: How can the cellular immune response control hepatitis B virus replication?. J Viral Hepat. 2000, 7: 321-326. 10.1046/j.1365-2893.2000.00234.x.View ArticlePubMedGoogle Scholar
- Thimme R, Wieland S, Steiger C, Ghrayeb J, Reimann KA, Purcell RH, Chisari FV: CD8(+) T cells mediate viral clearance and disease pathogenesis during acute hepatitis B virus infection. J Virol. 2003, 77: 68-76. 10.1128/JVI.77.1.68-76.2003.PubMed CentralView ArticlePubMedGoogle Scholar
- Webster GJ, Reignat S, Brown D, Ogg GS, Jones L, Seneviratne SL, Williams R, Dusheiko G, Bertoletti A: Longitudinal analysis of CD8+ T cells specific for structural and nonstructural hepatitis B virus proteins in patients with chronic hepatitis B: implications for immunotherapy. J Virol. 2004, 78: 5707-5719. 10.1128/JVI.78.11.5707-5719.2004.PubMed CentralView ArticlePubMedGoogle Scholar
- Sobao Y, Sugi K, Tomiyama H, Saito S, Fujiyama S, Morimoto M, Hasuike S, Tsubouchi H, Tanaka K, Takiguch M: Identification of hepatitis B virus-specific CTL epitopes presented by HLA-A*2402, the most common HLA class I allele in East Asia. J Hepatol. 2001, 34: 922-929. 10.1016/S0168-8278(01)00048-4.View ArticlePubMedGoogle Scholar
- Missale G, Redeker A, Person J, Fowler P, Guilhot S, Schlicht HJ, Ferrari C, Chisari FV: HLA-A31- and HLA-Aw68-restricted cytotoxic T cell responses to a single hepatitis B virus nucleocapsid epitope during acute viral hepatitis. J Exp Med. 1993, 177: 751-762. 10.1084/jem.177.3.751.View ArticlePubMedGoogle Scholar
- Tsai SL, Chen MH, Yeh CT, Chu CM, Lin AN, Chiou FH, Chang TH, Liaw YF: Purification and characterization of a naturally processed hepatitis B virus peptide recognized by CD8+ cytotoxic T lymphocytes. J Clin Invest. 1996, 97: 577-584.PubMed CentralView ArticlePubMedGoogle Scholar
- Rehermann B, Pasquinelli C, Mosier SM, Chisari FV: Hepatitis B virus (HBV) sequence variation of cytotoxic T lymphocyte epitopes is not common in patients with chronic HBV infection. J Clin Invest. 1995, 96: 1527-1534.PubMed CentralView ArticlePubMedGoogle Scholar
- Torre F, Cramp M, Owsianka A, Dornan E, Marsden H, Carman W, Williams R, Naoumov NV: Direct evidence that naturally occurring mutations within hepatitis B core epitope alter CD4+ T-cell reactivity. J Med Virol. 2004, 72: 370-376. 10.1002/jmv.20016.View ArticlePubMedGoogle Scholar
- Cao T, Desombere I, Vanlandschoot P, Sallberg M, Leroux-Roels G: Characterization of HLA DR13-restricted CD4(+) T cell epitopes of hepatitis B core antigen associated with self-limited, acute hepatitis B. J Gen Virol. 2002, 83: 3023-3033.View ArticlePubMedGoogle Scholar
- Belnap DM, Watts NR, Conway JF, Cheng N, Stahl SJ, Wingfield PT, Steven AC: Diversity of core antigen epitopes of hepatitis B virus. Proc Natl Acad Sci U S A. 2003, 100: 10884-10889. 10.1073/pnas.1834404100.PubMed CentralView ArticlePubMedGoogle Scholar
- Naoumov NV, Thomas MG, Mason AL, Chokshi S, Bodicky CJ, Farzaneh F, Williams R, Perrillo RP: Genomic variations in the hepatitis B core gene: a possible factor influencing response to interferon alfa treatment. Gastroentorology. 1995, 108: 505-514.View ArticleGoogle Scholar
- Chuang WL, Omata M, Ehata T, Yokosuka O, Ito Y, Imazeki F, Lu SN, Chang WY, Ohto M: Precore mutations and core clustering mutations in chronic hepatitis B virus infection. Gastroentorology. 1993, 104: 263-271.Google Scholar
- NCBI Protein Sequence Database.http://www.ncbi.nlm.nih.gov/entrez/
- Clustal. 2005,http://www.rfcgr.mrc.ac.uk/Registered/Webapp/emboss-w2h/
- Kitsch. 2005,http://evolution.genetics.washington.edu/phylip.html
- Treeview. 2005,http://taxonomy.zoology.gla.ac.uk/rod/treeview.html
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680.PubMed CentralView ArticlePubMedGoogle Scholar
- Rozas J, Rozas R: DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis. Bioinformatics. 1999, 15: 174-175. 10.1093/bioinformatics/15.2.174.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.