- Research article
- Open Access
Human cell-dependent, directional, time-dependent changes in the mono- and oligonucleotide compositions of SARS-CoV-2 genomes
BMC Microbiology volume 21, Article number: 89 (2021)
When a virus that has grown in a nonhuman host starts an epidemic in the human population, human cells may not provide growth conditions ideal for the virus. Therefore, the invasion of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), which is usually prevalent in the bat population, into the human population is thought to have necessitated changes in the viral genome for efficient growth in the new environment. In the present study, to understand host-dependent changes in coronavirus genomes, we focused on the mono- and oligonucleotide compositions of SARS-CoV-2 genomes and investigated how these compositions changed time-dependently in the human cellular environment. We also compared the oligonucleotide compositions of SARS-CoV-2 and other coronaviruses prevalent in humans or bats to investigate the causes of changes in the host environment.
Time-series analyses of changes in the nucleotide compositions of SARS-CoV-2 genomes revealed a group of mono- and oligonucleotides whose compositions changed in a common direction for all clades, even though viruses belonging to different clades should evolve independently. Interestingly, the compositions of these oligonucleotides changed towards those of coronaviruses that have been prevalent in humans for a long period and away from those of bat coronaviruses.
Clade-independent, time-dependent changes are thought to have biological significance and should relate to viral adaptation to a new host environment, providing important clues for understanding viral host adaptation mechanisms.
Severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), an RNA virus belonging to the betacoronavirus genus, began to spread in the human population in 2019. This viral strain is believed to have been originally prevalent in bats and transferred to the human population through intermediate hosts . Viral growth requires a wide variety of host factors (nucleotide pools, proteins, RNA, etc.) and should evade the diverse antiviral mechanisms of host cells (antibodies, killer T cells, interferon, RNA interference, etc.) [2,3,4]. Since ancestral SARS-CoV-2 strains are thought to be endemic in bats, they should be well adapted to their host environment; when the virus invades the human population, human cells may not provide growth conditions ideal for the virus. For efficient growth and rapid spread of the infection, changes in the viral genome should be required. Analyses of time-dependent changes in SARS-CoV-2 in the human population can be used to characterize how and why viral genomes change to adapt to a new host environment.
Due to the great threat of COVID-19 and remarkable development of sequencing technology, a massive number of SARS-CoV-2 genome sequences are available in databases, even though the epidemic has lasted for approximately 10 months. These sequence data have provided a wide range of insights into SARS-CoV-2 [5, 6]. Phylogenetic methods based on sequence alignment have been widely used in molecular evolution studies [7, 8], and these methods are well refined and essential for studying phylogenetic relationships between different viral species and variations in the same viral species at the single-nucleotide level. However, when dealing with a massive number of genome sequences, methods based on sequence alignment become problematic because they require a large amount of computational resources.
We have continued to develop sequence alignment-free methods focused on the oligonucleotide compositions of genome sequences [9,10,11,12]. Notably, oligonucleotide composition varies widely among species, including viruses, and is designated as genome signatures . These compositions can be treated as numerical data, and a massive amount of sequence data can easily be subjected to various statistical analyses. Furthermore, even genomic fragments without orthologous and/or paralogous pairs can be compared [11, 12, 14,15,16,17]. Specifically, our previous work on influenza A-type virus genomes found that the oligonucleotide compositions of the viral genomes differed between hosts (e.g., humans and birds), even for viruses within the same subtype (e.g., H1N1 and H3N2 of type A) [11, 12, 14]; we also examined changes in the oligonucleotide compositions of influenza H1N1/09, which have been epidemic in humans beginning in 2009, and found that their compositions changed to approach those of the seasonal flu strains H1N1 and H3N2 . Furthermore, although epidemics of the H1N1 and H3N2 strains began several decades apart, these strains showed highly similar chronological changes from the start of these epidemics. These evolutionary yet reproducible changes suggest that mutations to adapt to a new host environment inevitably accumulate when the host species of a virus changes, and these changes can be efficiently detected by analyzing oligonucleotide compositions.
Several groups, including ours, have examined changes in SARS-CoV-2 genomes during the early stages of the SARS-CoV-2 epidemic and found clear directional changes in a group of mono- and oligonucleotides detectable on even a monthly basis [15, 18, 19]. These directional changes will allow us to predict changes in the near future. Notably, near-future prediction and verification should be the most direct ways to test the reliability of the obtained results, models and ideas (e.g., those discovered for influenza viruses), providing a new paradigm for molecular evolutionary studies. In this context, the present study analyzed the genome sequences of over seventy thousand SARS-CoV-2 strains isolated from December 2019 to September 2020.
Directional changes in the mononucleotide compositions (%) of SARS-CoV-2
For fast-evolving RNA viruses, diversity within the viral population arises rapidly as the epidemic progresses and subpopulation structure forms; the GISAID consortium has defined at least seven main clades (G, GH, GR, L, V, S and Others). Notably, the elementary processes of molecular evolution are based on random mutations, and strains belonging to different clades are thought to have evolved independently. Therefore, the observation of highly similar time-dependent changes independent of clade has certain biological meanings and may be inevitable for efficient growth in human cells. From this perspective, we first examined time-dependent changes in the mononucleotide compositions (%) of SARS-CoV-2 strains isolated from December 2019 to September 2020.
Among the seven clades (G, GH, GR, L, V, S and Others) reported by the GISAID consortium, we used six clades (G, GH, GR, L, V and S), excluding Others, in the analysis. For the time-series analysis, we calculated the average mononucleotide compositions (%) of the genomes in each clade collected monthly; in Fig. 1a, the mononucleotide composition of each clade is shown as a colored line, while that for the monthly collected genomes belonging to all clades is shown as a dashed line.
Regardless of clade, the composition of C decreased, while that of U increased in a time-dependent manner, but the changes in A and G composition were less clear (Fig. 1a). Correlation coefficients between the mononucleotide composition and month from the start of the epidemic showed a high negative correlation for C and a high positive correlation for U for all clades, but there was no clear directionality for A and G (Fig. 1a and Tables 1, 2). These results indicate that the mononucleotide composition of this virus may be prone to biased mutations that reduce C and increase U or the mutated strains tend to be more favorable for growth in human cells.
Directional changes in short oligonucleotide compositions
Oligonucleotides are known to act as functional motifs, such as binding sites for a wide variety of proteins and target sites for RNA modifications. Therefore, directional changes in some oligonucleotides independent of clade may relate to certain processes for adaptation to the new host environment. Our previous work on influenza A viruses found that their oligonucleotide compositions varied among prevalent hosts [11, 12]; notably, although influenza virus isolated from humans tended to prefer A and U (but not G and C) more than viruses isolated from birds, the human viruses showed a preference for GGCG and GGGG, which are G- or C-rich. Importantly, there are various examples of oligonucleotides whose changes in composition cannot be explained by changes in mononucleotide composition alone, and these changes may relate to the molecular mechanisms of viral adaptation to a new host.
From this perspective, we next analyzed time-dependent changes in di- and trinucleotide compositions and found that a group of di- and trinucleotides showed a highly positive or negative correlation (Figs. 1b, S1 and Tables 1, 2). Interestingly, a group of A- or G-rich oligonucleotides, such as GAG and GGA, showed a high positive correlation independent of clade, which was not expected from the changes in mononucleotide compositions alone. To confirm the extent of these changes, we also calculated the fold change in composition for the first isolated month and the last examined month (Fig. 2) and found clear increases and decreases in mono- and oligonucleotide compositions common among the six clades, which supports the result presented in Fig. 1 and Tables 1 and 2.
Changes towards the sequences of other coronaviruses prevalent in humans
In a previous study of SARS-CoV-2 , we analyzed mono- and dinucleotide compositions for the first four epidemic months without separating the sequences by clade. Notably, the directional changes shown in Figs. 1 and 2 and Tables 1 and 2 were absolutely consistent with the previous results, even when the six clades were separately analyzed. In the previous study, time-series analysis of ebolavirus at the beginning of the epidemic in West Africa in 2014 also showed directional changes in a group of mono- and dinucleotide compositions, but these directional increases/decreases tended to slow approximately 10 months after the start of the epidemic. The increase/decrease trend for SARS-CoV-2 is far from slowing after 10 months, and the next important questions are how long these directional changes in this virus will last and whether there are possible goals to these changes.
To conduct this near-future prediction, the following information concerning influenza viruses should be useful. As mentioned before, mono- and oligonucleotide compositions in influenza H1N1/09 changed towards those of seasonal influenza strains such as the H1N1 and H3N2 subtypes . Furthermore, all the human subtypes showed directional changes away from the compositions of all avian influenza A subtypes and closer to those of the human influenza B type, which has been prevalent only in humans . If we assume that changes similar to those in the influenza virus will occur, the mono- and oligonucleotide compositions of interest for SARS-CoV-2 are expected to change towards those of other coronaviruses that have been prevalent in humans and away from those of coronaviruses prevalent in bats. To test this hypothesis, we analyzed the following coronaviruses: 238 human-CoV strains (alphacoronaviruses 229E and NL63: betacoronaviruses HKU1 and OC43) and 166 bat-CoV strains (alphacoronaviruses and betacoronaviruses, including the SARS virus).
As shown in Fig. 3a, we compared the mononucleotide compositions of SARS-CoV-2 with those of the human- and bat-CoV strains; the data for bat SARS among bat-CoV strains, which is thought to be the original strain that caused the current COVID-19 pandemic, are marked in pink. Interestingly, concerning the human- and bat-CoV strains, differences in mononucleotide composition were more pronounced between hosts than between the alpha and beta linages, and the levels for all six clades of SARS-CoV-2 were between those for the two hosts. Figure 3b shows the results of di- and trinucleotides, for which the directional, time-dependent changes were primarily common among the six clades. The increases and decreases in nucleotide composition observed for SARS-CoV-2 in Figs. 1 and 2 are indicated by hollow up and down arrows, respectively. Interestingly, all changes of interest tended to move away from the compositions of bat SARS and approach those of human-CoV, supporting the view that the directional changes of interest have biological significance and are possibly inevitable, as observed for influenza viruses. Assuming that approaching the levels in human-CoV strains is the hypothetical goal of the directional change of SARS-CoV-2, the current compositions are far from this hypothetical goal (Fig. 3); therefore, we predict that directional changes of interest will continue in the near future.
Then, assuming that the average value for all human-CoV strains is a hypothetical goal, we investigated how SARS-CoV-2 has approached this possible goal. Specifically, we calculated the square of the difference between the composition of each nucleotide in SARS-CoV-2 and the average value for human-CoV strains and plotted the values of the difference according to the elapsed month for each nucleotide. Changes in the compositions of both C and U clearly reduced this difference, as the compositions of these nucleotides approached the hypothetical goal (Fig. 4a); their linear reduction supports the prediction that directional changes in the composition of C and U will continue for the foreseeable future. In contrast, A and G did not show directional changes in composition, which is most likely due to the absence of clear differences in the A and G compositions of human- and bat-CoV, i.e., there is no possible target (Fig. 3a). Figure 4b shows examples of di- and trinucleotides whose compositions have moved towards the hypothetical goal, but Fig. 4c shows a few exceptional nucleotides whose compositions have not changed towards the hypothetical goal but have changed with a common directionality among the six clades. In Fig. 4d, correlation coefficients between the above difference and the elapsed month are presented. Most nucleotides of interest showed a negative coefficient (i.e., a directional change towards human-CoV), but three oligonucleotides, GG, AGC and CAU, showed positive coefficients indicating an increase in the difference (i.e., moving away from the human-CoV level). For these opposing directional changes, certain causes specific to SARS-CoV-2 may be assumed.
Motifs for RNA-binding proteins
Next, we considered the mechanisms that move oligonucleotide compositions away from those of bat coronaviruses and closer to those of human coronaviruses. Certain human cellular factors involved in viral growth may be candidates in such mechanisms. When considering possible protein factors, oligonucleotides longer than trinucleotides should be a focus. As an attempt, we here focused on host RNA-binding proteins because their binding to hepatitis C virus is known to be involved in the growth of this RNA virus . We thus searched for motifs for human RNA-binding proteins in coronavirus genomes (see Methods section) and found multiple loci with binding motifs for each protein. Table 3 (and Table S10) lists the motifs for which a directional time-dependent change was primarily common among six clades. Table 3 and Fig. 5a show that only ELAVL1 showed a positive correlation, but the other nine proteins in Table 3 showed a negative correlation for almost all clades; the results for other motifs are presented in Table S12.
We next compared the numbers of these motifs in SARS-CoV-2 with the numbers of human- and bat-CoV motifs (Fig. 5b). Of the ten proteins shown in Table 3, the only elevated motif, that for ELAVL1 binding, was found in a significantly higher number of loci in human-CoV than in bat-CoV, but motifs for PCBP2 and SRSF1 binding, which tended to decrease (Table 3), were found in significantly fewer loci in human-CoV. These observations appear to be consistent with the features found in the mono-, di- and trinucleotide compositions of interest. However, unlike these changes, there was significant diversity within even a single clade, which appears to be greater than the differences between hosts, with the possible exception of ELAVL1. In regard to long oligonucleotides, they should carry out a variety of functions, and mutations that accumulate in their functional motifs may have complex effects on the presence of functional motif sequences, so an analysis from a new perspective appears to become important.
We first discuss possible molecular mechanisms related to time-dependent directional changes in mononucleotide composition. Figure 1a shows that the frequency of C tended to decrease in SARS-CoV-2, while that of U tended to increase. Since a similar change was previously found for MERS and all A-type influenza subtypes [12, 14], these changes may have biological significance for a wide range of RNA viruses that invade from nonhuman hosts. One possible mechanism is the host RNA-editing function; Simmonds (2020) proposed that the C → U hypermutation in SARS-CoV-2 may be due to the influence of APOBEC family proteins in humans . APOBEC is an antiviral protein in various animal species, including humans, that can convert C to U by the deacetylation of C [21,22,23]. Such RNA editing is also known to act as a defense mechanism against various viruses, including retroviruses . The APOBEC gene family has generated various paralogs during mammalian evolution, with seven known APOBEC genes in humans and ten in bat families [25,26,27]. The prevalence C → U change in SARS-CoV-2 upon transfer of its host environment from bats to humans suggests that these changes may be due to human-specific APOBEC genes.
We next discuss changes in short oligonucleotides. Directional changes in some oligonucleotides, such as GAG and GGA, cannot be explained by APOBEC-induced C → U mutations alone. Although the evidence is weak, these oligonucleotides are part of the binding motifs of several RNA-binding proteins, such as SRSF1 and PCBP2 (Table S9); the number of loci for these motifs has decreased independently of clade. In contrast, the number of motif loci for only ELAVL1 among the ten proteins listed in Table 3 has increased independently of clade. As an RNA-binding protein that binds A- or U-rich elements, ELAVL1 binding to mRNA is known to contribute to RNA stability [28, 29]; SARS-CoV-2 and human-CoV, which are prevalent in humans, may contain increased binding motifs for ELAVL1 for efficient growth in the human cellular environment. However, for further analysis, information on RNA-binding proteins in bat cells is needed.
In the present study, we found that the compositions of a group of mono- and oligonucleotide in SARS-CoV-2 genomes have changed in a host cell-dependent manner. This is totally consistent to our previous finding for influenza A and B viruses [11, 12, 14], supporting the previous prediction that the host-dependent directional changes of various mono- and oligonucleotides should inevitably occur in zoonotic RNA viruses that have invaded from nonhuman hosts. Phylogenetic methods based on sequence alignment [7, 8] are well refined and undoubtedly essential for studying the phylogenetic relationships between viruses. The present alignment-free method to analyze mono- and oligonucleotide compositions can also serve as a powerful tool for molecular evolutionary studies of viruses, revealing directional changes in viruses and predicting the possible goals of these changes.
SARS-CoV-2 genome sequences
Human SARS-CoV-2 genome sequences were downloaded from the GISAID database (https://www.gisaid.org/); sequences that were complete, showed high coverage and had been isolated from humans were downloaded on Sep 17, 2020. Among the acquired sequences, strains with an unknown isolation month were excluded from the analysis, and the polyA tail was removed. A list of all 72,314 strains used is provided in Table S1.
Genome sequences of coronaviruses prevalent in humans or bats
The complete sequences of two types of human coronavirus (human-CoV) strains, alphacoronaviruses (27 229E and 55 NL63 strains) and betacoronaviruses (18 HKU1 and 138 OC43 strains), were obtained from the NCBI virus database (https://www.ncbi.nlm.nih.gov/labs/virus/). The complete genome sequences of two types of bat coronavirus (bat-CoV) strains, alphacoronaviruses (87 strains) and betacoronaviruses (79 strains, including 34 SARS-CoV), isolated from three types of bats (Chiroptera, Vespertilionidae and Rhinolophidae) were obtained from the NCBI virus database (https://www.ncbi.nlm.nih.gov/labs/virus/), and the polyA tail of each sequence was removed. The strains are listed in Table S2.
Time-series analysis of changes in oligonucleotide compositions
In the time-series analysis, the average mono- and oligonucleotide compositions (%) of viruses collected in each month were calculated for each clade. To avoid statistical fluctuations due to the small sample size, months in which fewer than 10 strains had been collected were excluded from the monthly analysis.
RNA-binding motif analysis
RNA-binding motifs were obtained from the ATtRACT database . In this database, multiple binding motifs are registered as corresponding to one RNA-binding protein; we calculated the total number of loci containing the binding motifs for each protein in the viral genomes.
Availability of data and materials
The accession numbers of all sequences analysed during this study are listed in additional file 2, and these sequence data are available in the GISAID database (https://www.gisaid.org/) or NCBI virus database (https://www.ncbi.nlm.nih.gov/labs/virus/). Other numerical data generated or analysed during this study are included in this published article and its additional files.
Severe acute respiratory syndrome coronavirus-2
Singhal T. A review of coronavirus disease-2019 (COVID-19). Indian J Pediatr. 2020;87(4):281–6. https://doi.org/10.1007/s12098-020-03263-6.
García-Sastre A. Inhibition of interferon-mediated antiviral responses by influenza a viruses and other negative-strand RNA viruses. Virology. 2001;279(2):375–84. https://doi.org/10.1006/viro.2000.0756.
Voinnet O. Induction and suppression of RNA silencing: insights from viral infections. Nat Rev Genet. 2005;6(3):206–20. https://doi.org/10.1038/nrg1555.
Randall RE, Goodbourn S. Interferons and viruses: an interplay between induction, signalling, antiviral responses and virus countermeasures. J Gen Virol. 2008;89(1):1–47. https://doi.org/10.1099/vir.0.83391-0.
Konno Y, Kimura I, Uriu K, Fukushi M, Irie T, Koyanagi Y, et al. SARS-CoV-2 ORF3b is a potent interferon antagonist whose activity is increased by a naturally occurring elongation variant. Cell Rep. 2020;32(12):108185. https://doi.org/10.1016/j.celrep.2020.108185.
Zhou, et al. A novel bat coronavirus closely related to SARS-CoV-2 contains natural insertions at the S1/S2 cleavage site of the spike protein. Curr Biol. 2020;30(11):2196–203. https://doi.org/10.1016/j.cub.2020.05.023.
Nei M. Molecular evolutionary genetics. New York: Columbia University Press; 1987. https://doi.org/10.7312/nei-92038.
Kumar S, Nei M, Dudley J, Tamura K. MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief Bioinform. 2008;9(4):299–306. https://doi.org/10.1093/bib/bbn017.
Abe T, Kanaya S, Kinouchi M, et al. Informatics for unveiling hidden genome signatures. Genome Res. 2003;13(4):693–702. https://doi.org/10.1101/gr.634603.
Abe T, Sugawara H, Kinouchi M, Kanaya S, Ikemura T. Novel phylogenetic studies of genomic sequence fragments derived from uncultured microbe mixtures in environmental and clinical samples. DNA Res. 2005;12(5):281–90. https://doi.org/10.1093/dnares/dsi015.
Iwasaki Y, Abe T, Wada K, Itoh M, Ikemura T. Prediction of directional changes of influenza a virus genome sequences with emphasis on pandemic H1N1/09 as a model case. DNA Res. 2011;18(2):125–36. https://doi.org/10.1093/dnares/dsr005.
Iwasaki Y, Abe T, Wada Y, Wada K, Ikemura T. Novel bioinformatics strategies for prediction of directional sequence changes in influenza virus genomes and for surveillance of potentially hazardous strains. BMC Infect Dis. 2013;13:386.
Karlin S, Campbell AM, Mrazek J. Comparative DNA analysis across diverse genomes. Annu Rev Genet. 1998;32(1):185–225. https://doi.org/10.1146/annurev.genet.32.1.185.
Wada Y, Wada K, Iwasaki Y, Kanaya S, Ikemura T. Directional and reoccurring sequence change in zoonotic RNA virus genomes visualized by time-series word count. Sci Rep. 2016;6(1):36197. https://doi.org/10.1038/srep36197.
Wada K, Wada Y, Iwasaki Y, Ikemura T. Time-series oligonucleotide count to assign antiviral siRNAs with long utility fit in the big data era. Gene Ther. 2017;24(10):668–73. https://doi.org/10.1038/gt.2017.76.
Wada K, Wada Y, Ikemura T. Time-series analyses of directional sequence changes in SARS-CoV-2 genomes and an efficient search method for candidates for advantageous mutations for growth in human cells. Gene. 2020;5:100038.
Qiu Y, Abe T, Nakao R, Satoh K, Sugimoto C. Viral population analysis of the taiga tick, Ixodes persulcatus, by using batch learning self-organizing maps and BLAST search. J Vet Med Sci. 2019;81(3):401–10. https://doi.org/10.1292/jvms.18-0483.
Mercatelli D, Giorgi FM. Geographic and genomic distribution of SARS-CoV-2 mutations. Front Microbiol. 2020;22(11):1800.
Simmonds P. Rampant C→U hypermutation in the genomes of SARS-CoV-2 and other coronaviruses: causes and consequences for their short- and long-term evolutionary trajectories. mSphere. 2020;24:e00408–20.
Paek KY, Kim CS, Park SM, Kim JH, Jang SK. RNA-binding protein hnRNP D modulates internal ribosome entry site-dependent translation of hepatitis C virus RNA. J Virol. 2008;82(24):12082–93. https://doi.org/10.1128/JVI.01405-08.
Harris RS, Bishop KN, Sheehy AM, Craig HM, Petersen-Mahrt SK, Watt IN, et al. DNA deamination mediates innate immunity to retroviral infection. Cell. 2003;113(6):803–9. https://doi.org/10.1016/S0092-8674(03)00423-9.
Mangeat B, Turelli P, Caron G, Friedli M, Perrin L, Trono D. Broad antiretroviral defence by human APOBEC3G through lethal editing of nascent reverse transcripts. Nature. 2003;424(6944):99–103. https://doi.org/10.1038/nature01709.
Zhang H, Yang B, Pomerantz RJ, Zhang C, Arunachalam SC, Gao L. The cytidine deaminase CEM15 induces hypermutation in newly synthesized HIV-1 DNA. Nature. 2003;424(6944):94–8. https://doi.org/10.1038/nature01707.
Harris RS, Dudley JP. APOBECs and virus restriction. Virology. 2015;479–480:131–45.
Sawyer SL, Emerman M, Malik HS. Ancient adaptive evolution of the primate antiviral DNA-editing enzyme APOBEC3G. PLoS Biol. 2004;2(9):E275. https://doi.org/10.1371/journal.pbio.0020275.
Münk C, Willemsen A, Bravo IG. An ancient history of gene duplications, fusions and losses in the evolution of APOBEC3 mutators in mammals. BMC Evol Biol. 2012;12(1):71. https://doi.org/10.1186/1471-2148-12-71.
Henry M, Terzian C, Peeters M, Wain-Hobson S, Vartanian JP. Evolution of the primate APOBEC3A cytidine deaminase gene and identification of related coding regions. PLoS One. 2012;7(1):e30036. https://doi.org/10.1371/journal.pone.0030036.
Wang W, Caldwell MC, Lin S, Furneaux H, Gorospe M. HuR regulates cyclin a and cyclin B1 mRNA stability during cell proliferation. EMBO J. 2000;19(10):2340–50. https://doi.org/10.1093/emboj/19.10.2340.
Lal A, Mazan-Mamczarz K, Kawai T, Yang X, Martindale JL, Gorospe M. Concurrent versus individual binding of HuR and AUF1 to common labile target mRNAs. EMBO J. 2004;23(15):3092–102. https://doi.org/10.1038/sj.emboj.7600305.
Giudice G, Sánchez-Cabo F, Torroja C, Lara-Pezzi E. ATtRACT-a database of RNA-binding proteins and associated motifs. Database (Oxford). 2016;7:baw035.
We gratefully acknowledge the authors submitting their sequences from GISAID’s Database and also the valuable comments of Dr. Yashushi Hiromi of National Institute of Genetics (Mishima). We thank Springer Nature Author Services for editing this manuscript for English language.
This work was supported by JSPS KAKENHI Grant Number 18K07151, by AMED under Grant Number JP20he0622033, by JST CREST Grant Number JPMJCR20H1 and by COVID-19 Counterplan Research Project (supervised by Prof. Tatsumi Hirata, NIG) from the Research Organization of Information and Systems (ROIS).
Ethics approval and consent to participate
Consent for publication
The authors declared that there are no conflicts of interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Average di- and trinucleotide compositions (A and B) of for SARS-CoV-2 strains collected in each elapsed month. Fig. S2. Oligonucleotide compositions of human and bat coronavirus sequences. Fig. S3. Differences in oligonucleotide composition between SARS-CoV-2 and human-CoV.
List of SARS-CoV-2 strains used in the analysis. Table S2. List of human-and bat-CoV strains used in the analysis. Table S3. Number of SARS-CoV-2 strains in each clade isolated in each elapsed month. Table S4. Average oligonucleotide compositions for SARS-CoV-2 strains in each clade isolated in each elapsed month. Table S5. Correlation coefficients for time-dependent changes in oligonucleotide compositions of SARS-CoV-2. Table S6. Fold change in compositions between strains of the first and last month of the analysis. Table S7. Distance between the oligonucleotide composition of SARS-CoV-2 isolated in each elapsed month and that of human-CoV. Table S8. Correlation coefficients for time-series changes in the distance between oligonucleotide compositions of SARS-CoV-2 and human-CoV. Table S9. List of RNA-binding motifs. Table S10. Numbers of motif-containing loci for RNA-binding proteins whose abundance increases or decreases between strains of the first and last month of the analysis. Table S11. P-value from t-test to analyze the number of RNA-binding motif loci whose abundance increases or decreases between strains of the first and last month of the analysis. Table S12. Correlation coefficients for time-dependent changes in the number of loci containing RNA-binding motifs.
About this article
Cite this article
Iwasaki, Y., Abe, T. & Ikemura, T. Human cell-dependent, directional, time-dependent changes in the mono- and oligonucleotide compositions of SARS-CoV-2 genomes. BMC Microbiol 21, 89 (2021). https://doi.org/10.1186/s12866-021-02158-6
- Oligonucleotide composition
- Time-series analysis
- Big data
- Zoonotic virus
- RNA virus
- Viral adaptation