BMC Infectious Diseases volume 25, Article number: 778 (2025) Cite this article
A number of studies have shown that methylomes of blood cells of COVID-19 patients display infection-related methylation changes. Those methylation abnormalities were associated with clinical outcomes of the infection and, thus, can potentially be utilized as biomarkers in clinical COVID-19 management. However, a number of parameters need to be assessed, before a clinical use of a biomarker candidate is proposed, with the most important one being reproducible detection in independent target populations.
We benchmarked data reported in publications investigating genome-wide methylation changes occurring in blood cells during SARS-CoV-2 virus infection to assess potential applicability of these changes as biomarkers in clinical management of COVID-19 patients.
We identified nine studies, in which infection-related genome-wide methylation changes in blood cells of COVID-19 patients were reported and associations of those changes with outcomes of the infection was assessed. Our benchmarking analysis showed that each of those studies included patients with different clinical characteristics and each study utilized different symptoms assessment criteria. Most importantly, no uniform data analysis methodologies to identify virus-related methylation changes were used in the analyzed studies. Consequently, analyzed studies, except for one, which was based on uniform data analysis workflow applied across independent patient cohorts, reported sufficiently uniform results that would allow to propose the use of the identified infection-related methylation changes as biomarkers for clinical management of COVID-19 patients.
Due to lack of coherence between studies reporting SARS-CoV-2 associated methylation changes in blood, the potential use of already reported methylation changes as biomarkers for clinical management of COVID-19 patients needs to be further assessed in rigorously controlled studies. At the same time unified data analysis framework appears to be the most critical in development of biomarkers associated with the severity of SARS-CoV-2 infection.
Despite that WHO has officially announced the end of COVID-19 pandemic, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) still remains a concern worldwide for the medical community. Not only because infection can progress to the acute phase in patients without preexisting conditions, but also because increasing number of data shows that long term adverse effects of the infection, referred to as Long COVID or Post-COVID, can persist for longer periods after initial COVID-19 infection [1, 2]. The most common symptoms of the infection are fever, weakness, muscle aches, coughing, and shortness of breath [1, 2]. Involvement of the nervous, gastrointestinal, and cardiac systems might also occur [1]. Infection for the majority of vaccinated patients does not progress into severe pneumonia [3], but patients that progress into pneumonia cannot be effectively identified with current diagnostic procedures. There are several known clinical predictors of severe infection outcomes, such as pre-existing comorbidities (relative risk [RR] = 2.11, p = 0.046), male gender (RR = 1.11, p = 0.039), older age (standardized mean difference [SMD] = 0.68, p < 0.001) or obesity (odds ratio [OR] = 1.21, 95% CI 1.002–1.46), but the specificity of those risk factors is weak and in principle they do not allow to predict the outcomes of the infection for diagnostic purposes [4,5,6]. Moreover, increasing number of data shows that diverse symptoms, such as cognitive deficits, fatigue or other neurological conditions (e.g. sensory disorders), can persist for over 12 weeks after infection, highlighting the need of further studies into the molecular mechanisms of the SARS-CoV-2 infection [7,8,9].
The genome-wide association studies (GWAS) have identified several genetic polymorphisms that associate with the severity of the infection, such as rs5443 in GNB3 gene [10], rs3853839 in TLR7 gene [11], rs6598045 in IFITM3 [12], and rs2070788 in TMPRSS [13]. However, reported associations of the genetic background with COVID-19 severity are in general weak [10,11,12,13].
Several studies have identified methylation changes in blood cells of SARS-CoV-2-infected patients. These changes appear to not only occur relatively early after the infection, but, more importantly, to associate with the severity of COVID-19 outcomes [14,15,16,17,18,19,20,21] and persist even a year after the recovery from an acute phase of the disease [22]. This suggests that SARS-CoV-2-induced methylation changes can potentially be developed into biomarkers allowing for stratification of the patients that are likely to develop severe infection symptoms. The disease-related methylation changes have been shown to be very robust biomarkers useful in clinical management of other diseases, especially cancer [23]. However, to postulate the clinical utility of the biomarker, a number of clinical and analytical parameters of the biomarker need to be assessed, with reproducibility of the biomarker detection in independent target patient cohorts being most critical for a new promising biomarker.
Here, we benchmarked the results of the studies that reported the association of SARS-CoV-2-related methylation changes in blood cells with the clinical outcomes of the COVID-19 disease to assess the potential of those methylation changes to be developed as biomarkers useful in clinical management of patients with SARS-CoV-2 infection.
We searched PubMed MEDLINE for studies reporting association of DNA methylation changes in blood with clinical outcomes of SARS-CoV-2 infection. The flow diagram summarizing our search strategy and the details on study design are provided at Additional File 1. The search terms we used included ‘COVID-19 AND DNA methylation AND blood’ combined interchangeably with ‘severity’, ‘symptomatic’ or ‘mortality’. This search returned 56 publications from which we excluded reviews (n = 1), gene expression studies (n = 1), studies that referred to other than methylation epigenetic mechanism of gene expression regulation (n = 1), studies that were focused on different than COVID-19 condition (overall: n = 6 - pneumonia: n = 2, colon adenocarcinoma: n = 1, lung cancer: n = 1, asthma/rhinitis: n = 1, serum hyperferritinemia: n = 1) or to previously published data (n = 4). We also dismissed studies that were performed on children (n = 1), cell cultures (n = 1), other DNA source than blood (overall: n = 4– cord blood: n = 1, cfDNA: n = 3), or were performed in other than human organisms (overall: n = 2 - Echinacea purpurea: n = 1, mice: n = 1). Additionally, we rejected articles that did not directly addressed severity of the symptoms, e.g. analyses of DNA methylation changes in comparison to healthy controls (n = 4). Moreover, 13 of the identified articles were out of the scope of this paper and focused on: prediction of time-since-COVID-19 infection (n = 1), pathogenesis of immune/systemic dysregulation in COVID-19 (n = 5), COVID-19 mRNA vaccination (n = 2), COVID-19 as a confounder in epigenetic age estimation (n = 2), description of commercial prognostic epigenetic blood test for COVID-19 severity without provided details on study groups and CpG probes used (n = 1), immune reference matrix for diverse phenotypes (n = 1), and a study focused on COVID-19 survivors (n = 1). We did not include in our analyses studies describing influence of infection on “epigenetic clocks” (n = 2) and those reporting methylation changes at single genes associated with the severity of COVID-19 (n = 4). We also excluded studies from which we could not access the exact data on the number of COVID-19 subjects with severe disease course (n = 1). We were not able to access article text despite attempt to contact the authors of two studies.
Finally, nine DNA methylation studies were included in our analysis, with coherent molecular and clinical data allowing us to assess clinical utility of reported SARS-CoV-2-related methylation changes according to the categories selected for benchmarking [14, 16,17,18,19,20,21, 24, 25]. The benchmarking categories included: (1) study design, (2) patient characteristics in analyzed studies, (3) definition of severity of the infection, (4) technology used for genome-wide methylation screening (5) clinical material source for the analysis, (6) clinical material processing for the methylation analysis, (7) pipelines for methylation microarray data pre-processing, (8) the number and the magnitude of the identified methylation changes the reproducibility of the methylation changes across studies, (9) reproducibility of COVID-19 severity related methylation changes between independent studies. We then used PICOS (participants, intervention, context, outcome, study design) acronym [26] to select the eligible studies (see Table 1). We included in data analysis studies that: (1) were peer-reviewed original research studies, (2) fulfilled the individual PICOS elements [26].
The studies that we analyzed included: five retrospective case control studies [14, 16, 21, 24, 25], two cross-sectional studies [18, 19], and two prospective cohort studies [17, 20]. Each of these designs allowed to stratify patients according to the severity of symptoms and to perform differential methylation analyses on diagnostic samples. The number of patients in the studies largely differed from n = 731 and n = 473 patients in [19, 25], respectively to n = 14 [18] (Table 2).
Age of the patient is one of the strongest risk factors associated with the severity of the infection outcome [4]. Also, methylation patterns of blood cells change with age [27]. Thus, it is plausible that age of the patients will influence methylome and needs to be considered in analysis aiming to identify methylation changes associated with the severity of the infection. Age of the patients in the studies that we considered in our analysis significantly differed. Specifically, in the “youngest” cohort included in the study, the median age was 42 and ranged between 19 and 61 [14]. In the “oldest” cohort, the mean age of participants in discovery cohort was 67 and 76 in mild and severe cases, respectively. The participants in the replication cohort included in this study had an average age of 61 in mild group and of 61 in severe group [19] (Table 2).
Also, different ethnic groups have been shown to display ethnicity-related methylation signatures in peripheral blood [28]. Moreover, specific ethnic groups have been reported to display different risks of acquiring SARS-CoV-2, as well as severity of clinical outcomes [29]. Four studies included in our analysis specified race or ethnic origin of the participants [14, 17, 20, 24]. Specifically, a study [17] included mild cases of the following ethnicity: White (38.5%), Black (8.8%), Asian (1.8%), Hispanic (28.1%) and other ethnicity groups (22.8%). Severe group in this study consisted of White (53.3%), Black (13.3%), Asian (2.2%) and Hispanic (11.1%) subjects or cases declaring other ethnicities (20.1%). In the second study [14], entire cohort consisted of subjects from West-Eurasia (72%) and contained also cases from Central-South America (discovery cohort: 18%, validation cohort: 33%). The third study [24] involved COVID-19 cases that were mostly Hispanic or Latino (53.7%) and with the following ancestry: American Indian/Alaskan Native (0.6%), Asian (7.3%), Black/African American (15.2%), Native Hawaiian/Other Pacific Islander (1.2%), White (28%) or other declared (42.7%), missing (4.9%). In fourth study [20], most of the participants originated from South Asia. Mild group consisted of cases from Middle East (2.6%), North Africa (1.3%), Northeast Africa (5.2%), South Asia (82.8%), Southeast Asia (6.5%) and Western Asia (1.3%), whereas in severe group participants declared ethnicity of Middle East (4.1%), Northeast Africa (8.2%), South Asia (75%), Southeast Asia (8.2%) and Western Asia (4.1%). The study that was based on reported and new patient cohort [25] and included patients from Poland, Spain and USA, also did not take into account the ethnicity of the participants in the analysis.
Overall, the majority of the cohorts in analyzed studies which reported ethnicity consisted of Caucasians or populations of mixed descent. Two studies [14, 20] involved patients predominantly of Asian and West-Eurasian descent (Table 2).
It has also been shown that morbidity and mortality from COVID-19 are higher in men than in women and, therefore, sex of the patients should be considered in the analysis of methylation changes induced by the virus [30, 31]. In general, studies that we found eligible for our analysis were not balanced in terms of gender, with males being over-represented among the patients (Table 2). For example, in the study [20] all subjects with severe COVID-19 were men and male participants comprised the majority of severe cases in the study [17].
A number of indexes, such as WHO Clinical Progression Scale [32], COVID-19 Global Risk Assessment Model (COVID-GRAM) score [17] or Acute Respiratory Distress Syndrome (ARDS) Berlin severity index [33] have been developed with the aim to unify the assessment of severity of SARS-CoV-2 infection. However, none of those scales entered general clinical use.
The studies that we analyzed did not utilize unified assessment of the severity of the outcomes (Table 3). Specifically, in the study [19] subjects were stratified using WHO Clinical Progression Scale [32], where mild and severe cases had WHO score of 1 to 4 (ambulatory or hospitalized, mild symptoms) and 5 to 8 (hospitalized with severe symptoms or dead), respectively. The study [17] classified subjects according to the risk of severe disease outcome using dichotomized GRAM-score risk scale [17], with GRAM score under 50% and over 50% denoting lower and higher risk of severe COVID-19 course, respectively. The study [20] largely differed from other studies, because it only included cases with COVID-19 and acute respiratory distress syndrome under mechanical ventilation. In this study survivals of 60-day follow-up period were considered mild cases. The study [24] did not stratify subjects into mild and severe groups, but focused on the development of a methylation score predicting whether a patient was: discharged from the emergency department, admitted to the hospital, accepted to ICU or died.
The study [16] defined mild cases as those not requiring hospitalization and without neither fever nor dyspnea, and severe cases as those hospitalized because of SARS-CoV-2 manifesting: fever higher than 38 °C, dyspnea, cough and fatigue. In the study [14] mild group consisted of asymptomatic or paucisymptomatic subjects which were not hospitalized, whereas severe group included patients that were either hospitalized with oxygen therapy, such as nasal mask or prongs, or admitted to intensive care unit (ICU) with the need for mechanical ventilation, additional organ support or extra corporeal membrane oxygenation (ECMO). In the study [18] mild cases were those experiencing mild fatigue, low fever, discrete olfactory and taste disturbances (a according to the guideline on the management of COVID-19 published by the National Health Commission of the People’s Republic of China, version 8.0 [18]) and no visible signs of pneumonia in imaging, whereas severe cases presented with dyspnea, oxygen saturation lower than 93% or arterial blood oxygen partial pressure/oxygen concentration lower than or equal to 40 kPa. In the study [21] patients admitted to ICU or dead from the infection were classified as severe and those discharged after diagnosis were classified as cases with mild symptoms. In the study [25], mild cases were those not requiring hospitalization and severe group consisted of hospitalized subjects. The overview of the severity criteria in analyzed studies is shown in Table 3.
The MethylationEPIC BeadChip (Infinium) microarray is a technology that has become sufficiently cost-efficient to warrant of genome-wide methylation studies in clinical settings that in most of the cases are based on large patient cohorts. Also, it has been reported that to achieve a level of precision broadly comparable to the methylation array, a minimum coverage of 100x in NGS studies is recommended [34]. Adding to utility of this technology in studies based on large cohorts of subjects. The studies of SARS CoV-2-related methylation changes that we included in our analysis also utilized this technology. Most of the studies used standard chip design. One study utilized microarray with additional probes in loci within or spanning regions encoding important components of immunity, such as major histocompatibility complex (MHC) or known human leukocyte antigen (HLA) alleles [24].
Data showing that in some instances blood cells may be infected by the SARS-CoV-2 have been reported [35], but in general terms blood cells are not targeted by the virus. However, the antiviral defense is triggered by immune system response and identified in blood of COVID-19 patients methylation changes are a part of host methylome response to the infection. Thus, major target in the analysis aiming to develop biomarkers of the severity of the infection are cells of the immune system, methylomes of which respond to the infection. Eight of the studies that we included in our analysis were based on DNA extracted from diagnostic peripheral blood sample and one utilized peripheral blood mononuclear cells (PBMCs).
Despite that DNA extraction methods have been well standardized; the use of different DNA extraction methods needs to be considered when comparing the results of studies. Two studies included in our analysis did not specify the DNA extraction kit used [14, 21], two of the studies used QIAamp DNA Blood Mini Kit (QIAGEN Diagnostics GmbH, Hilden, Germany) [18, 19], one QIAsymphony DSP DNA Mini Kit (QIAGEN Diagnostics GmbH, Hilden, Germany) [16], one Allprep DNA/RNA mini kit (QIAGEN Diagnostics GmbH, Hilden, Germany) [20], one GeneJET whole blood kit (ThermoFisher Scientific) [17], one PureLink™ Genomic DNA Mini Kit for Polish original cohort in study [25], and one bead-based, automated extraction Maxwell(R) RSC System (Promega) [24]. According to manufacturer recommendations, minimum DNA input for EPIC microarray methylation profiling should be between 250 ng and 1000 ng, which in the beginning of the experiment is subjected to bisulfite modification. Although it is safe to assume that all of the analyzed studies followed the official recommendations, two of them specified input of DNA used for bisulfite modification, a necessary step in MethylationEPIC BeadChip (Infinium) microarray-based methylation screening. The amounts of the DNA used in those studies were in the recommended range: 500 ng in study [19] and 900 ng in study [21].
The quantity and quality of DNA used in methylation analysis is a major limitation of the analysis. Although manufacturer makes specific recommendations for the DNA quality and quantity assessment method to be used before hybridization to the microarray, five studies that we analyzed [14, 16, 18, 20, 25] did not report the DNA quality assessment method used. In one study DNA quality assessment was performed with ultraviolet (UV) visible (VIS) spectrophotometry (N60 Implen Nanophotometer) [21], one utilized Qubit [17] and two used both Qubit and spectrophotometry [19, 24].
Also, the microarray manufacturer specifically recommends one of the specific Zymo Research kits (EZ-96 DNA Methylation Kit, EZ DNA Methylation Kit, EZ-96 DNA Methylation-Lightning MagPrep Kit and EZ DNA Methylation-Lightning Automation Kit) for bisulfite conversion in EPIC microarray protocol. Four of the studies that we analyzed [16, 19, 21, 24] reported the use of the recommended kits, others did not specify the bisulfite modification method used.
Before statistical methods can be used to identify methylation changes in the microarray-based studies, the microarray data need to be pre-processed to numeric values representing methylation levels at screened CpG sites. A number of the pre-processing bioinformatics pipelines has been developed for MethylationEPIC BeadChip (Infinium) microarray data and studies selected for our analysis utilized eight of them, specifically: minfi in [14, 16, 17, 19, 20], DMRcate in [16], meffil in [19], ChAMP in [14, 25], Illumina Genome Studio in [18], RnBeads in [21], maxprobes in [20] and SeSAME in [24]. Each of these pipelines may differ in type of the normalization method, implementation of cell fraction and batch effect correction, and the use of those elements of the data pre-processing can significantly influence the data analysis outcomes.
Type of normalization used
Due to the design, Illumina microarrays contain two types of probes and the methylation level measurements derived from those two types of probes need to be normalized before downstream analyses to avoid probe design bias [36]. Several normalization methods have been developed and, in the studies selected for our analysis, six different methods were used, including: functional normalization [16, 17, 19], ssNoob [14], background correction [18], BMIQ [21, 25], stratified quantile normalization (SQN) [20] and noob [24]. The lack of coherence of normalization techniques between compared studies might have influenced the results of the analyses and number of the methylation changes identified and thus we attempted to benchmark methods used at each step of data analysis.
Cell fraction correction
When analyzing a blood sample consisting of mixture of different cell types, proportions of which may change during viral infection, it is essential to consider sample composition in the data analysis [37]. Methods that allow to adjust methylation level readouts for cell fraction proportions in specific sample have become standard in the analysis of the methylation patterns in blood samples.
Correction of beta values, which are methylation level measurements scale in BeadChip technology, for cell fraction proportions during data preprocessing is considered the most accurate approach when accounting for blood cellular heterogeneity [38]. However, beta values were not directly adjusted during data preprocessing for calculated cell type proportions in any of the analyzed here studies except for study [25] where robust partial correlation method implemented in EpiDISH R package was used. However, four studies included inferred cell type proportions as covariates in statistical models used for the identification of the methylation changes. Specifically, in a study [19] cell type proportions were calculated using robust partial correlation method from EpiDISH R package. Then, the estimated cell type proportions were used to adjust linear regression model in differential methylation analysis between mild and severe patient groups. The study [16] used white blood cell counts estimated with Horvath’s DNAmAge calculator [39] as covariates in a linear regression model fitted to identify methylation differences between mild and severe cases. In the study [21] cellular composition was also calculated using DNAmAge calculator and estimated cell type proportions were used as covariates in a differential methylation analysis between patients displaying mild and severe symptoms which were based on linear model (limma). In the study [17] leukocyte proportions were inferred from methylation levels and included as covariates in the statistical linear regression model used in differential methylation analysis. However, in this study cell fraction estimation method was not specified. In the study [20] cell type proportions were inferred with FlowSorted.Blood.EPIC R package, however, these proportions were not included as covariates in differential methylation analysis. Three studies that we included in our analyses did not address sample heterogeneity in the analyses [14, 18, 24].
Batch effect correction
One other source of the interexperimental variation in the analysis of Methylation EPIC BeadChip microarray data is inert technical variation observed between experiments ran on different microarray batches called “batch effect” [37]. This limitation can also influence results of microarray data pre-processing and should be addressed in the data analysis pipeline. The ComBat algorithm was developed to address this limitation and can be used in experimental designs where batch effect is expected [37]. Only one of the studies that we analyzed implemented ComBat correction to minimize technical variation [17]. The sample randomization, which also effectively minimizes technical variation, was used in the study [24]. In the study [21] batch effect was assessed with PCA, however, authors did not specify if this covariate was used in the analysis. Authors of one study performed evaluation of the batch bias and did not recognize the need for batch effect correction [19]. Remaining studies did not consider batch effect in the data analysis [14, 16, 18, 20]. In the study [25] batch effect was assessed but there was no indication of its influence on data.
Statistics used for the identification of the methylation changes
The methylation changes can be reported as changes at specific CpG sites (Differentially Methylated Positions– DMPs) or changes over a genomic region with a number of adjacent CpG sites displaying similar methylation levels (Differentially Methylated Regions– DMR). Overall, each of the studies we analyzed here utilized different statistical approaches for the identification of infection-related methylation changes.
Specifically, in the study [16] where no methylation changes between mild (n = 48) and severe (n = 61) were identified, linear regression adjusted for sex and age from limma R package (Bonferroni-corrected p < = 0.05) was used.
The study [17] utilized two-stage approach to identify methylation changes. First, 19 DMRs were identified using DMRcate R package, with DMR defined as region with more than 3 CpG sites displaying FDR-corrected p < = 0.0001 methylation change. Then, recursive feature elimination algorithm was used to select 77 CpGs displaying virus-related methylation changes within regions defined in the first step of the analysis.
In the study [14] linear regression (limma R package) adjusted for sex and age was used to identify 44 CpG sites with statistically significant (Benjamini-Hochberg-corrected p < = 0.05) methylation changes of more than 10% points and machine learning (including random forest, adaptive boosting, multivariate adaptive regression splines, k-nearest neighbor, radial kernel support vector and logistic regression) was used to assess the predictive power of severity of infection based on identified subset of methylation changes.
The study [18], reported 245 methylation changes between severe (n = 7) and mild (n = 7) cases that were identified with Illumina Genome Studio software. Only methylation changes larger than absolute mean beta value of 0.17 and absolute diffscore larger than 13 (corresponding to p < = 0.05) were considered significant. The diffscore was defined as 10*sgn(βref– βcond)*log10(p), where βref and βcond refer to mean beta values for severe and mild groups, respectively [18].
In the study that included largest cohort of patients n = 473 COVID-19 [19], two separate analyses were performed to identify severity-associated methylation changes. In the first analysis 20 CpGs were identified using linear regression adjusted for age and gender, between mild and severe patients, defined according to WHO clinical ordinal scale with mild cases being: PCR positive individuals (ambulatory or hospitalized with mild symptoms, 1–4 scales), and severe PCR positive individuals (hospitalized with severe symptoms or died, 5–8 scales). In the second analysis the WHO-based severity score was considered as continuous variable in linear regression model adjusted with age and sex and this approach identified 257 DMPs. In both analyses methylation changes were considered significant when displayed nominal p-value lower than 0.01 and when meta-analysis p-value (computed with restricted maximum likelihood and fixed effects methods implemented in metafor R package) was lower than 5e-08.
The study [21] reported 21 methylation changes between 123 mild and 64 severe COVID-19 patients using multi-step procedure. First, that procedure identified 880 DMPs with linear regression (R, limma, age- and sex-adjusted, p < = 0.05). Next, those 880 CpGs were used for an analysis based on unsupervised hierarchical clustering. The dendrogram obtained from clustering was progressively partitioned into five clusters, one of which was strongly enriched with severe patients (p = 3.4 × 10− 5, hypergeometric test). Then, five clusters were compared to each other to select DMPs specific to severe subjects (exact specification how the analysis was performed was not reported) and 21 DMPs was selected to constitute final signature.
The study [20] also utilized two procedures to identify severity-related methylation changes. Here, authors reported 49 methylation changes identified with spline regression (splineDiffExprs function from splineTimeR package) which analyzed combined methylomics data from four timepoints (T1 - admission to the ICU, T2 - day 7– T2, T3 - day 14, T4 - day 21) between patents who either recovered (n = 24) or died (n = 12) from infection at day 60. As well as 13 DMPs predicting outcome-related signature identified using univariate Cox proportional hazard model with Z-score transformed methylation data and hospital stay as a time variable in RegParallel R package. The study [24] which included methylomics data from 460 subjects (SARS-CoV-2 positive: n = 164, SARS-CoV-2 negative: n = 296) utilized an algorithm that did not aim to find DNA methylation signatures, but perform subject classification according to the symptoms and outcomes. Specifically, in this study elastic net penalized regressions (glmnet, R) were used with multiple hyperparameter sets to all autosomal CpG probes that passed QC with the aim to obtain a methylation score predictive of patient outcome measured by self-constructed ordinal severity scale that included: dischargement from emergency department, admission to hospital, progression to ICU and death. Lastly, in the study [25] methylation changes associated with hospitalization were identified using logit regression model adjusted for age and sex.
The largest number of CpG sites associated with infection-induced methylation changes identified in the studies that we analyzed was 257 [19]. Overall, the studies reported from eight [20] to 77 differentially methylated CpG sites [17]. One study did not report any methylation changes associated with COVID-19 symptoms severity [16] (Table 4).
Two of the analyzed studies did not report the magnitude of methylation changes between cases and controls. First of those studies [24] aimed to perform classification of the subjects according to the symptoms and outcome and, thus, did not aim to find specific severity-related signature. The second [20] classified CpGs as differentially methylated only with the use of p-value cutoff without considering the magnitude of the methylation changes. In the study, that reported the largest number of methylation changes, the level of methylation difference at the identified CpG sites was more than 17% points [18]. Two of the studies [14] and [25] focused only on methylation changes larger than 5% points. The methylation level differences reported in remaining studies were less than 5% points (Table 4 and in electronic Additional File 2).
The major requirement for a robust and clinically applicable biomarker is to be reproducibly detectable in independent cohorts of patients. Unfortunately, only one study included in our benchmarking analysis [25] that at the same time reanalyzed data from previous publications, reported 2 methylation changes that were also found in previous study [14] and 11 changes that were reported in study [19]. This study detected a subset of methylation changes across independent cohorts of patients from Poland [25], Spain [14] and two locations in USA [17, 24], but utilized uniform data analysis methodology that was different from the methods used in original publication of data reanalyzed in this study. This clearly indicates that coherent data analysis workflow is essential in analysis of SARS-CoV-2-related and most likely also other disease-related methylation changes. The details of comparison COVID-19 severity-related methylation changes between analyzed studies are provided in Additional File 3 (Table 1).
The major requirement for a robust and clinically applicable biomarker is to be reproducibly detectable in independent cohorts of patients with specific condition for management of which the biomarker is developed. Studies aiming to investigate the influence of SARS-CoV-2 virus on the methylome of blood cells reported a number of methylation changes that can potentially be developed for biomarkers useful in the prediction of the infection outcomes. Subsequently, Gene Set Enrichment Analyses linked those changes to molecular pathways that are involved in the response of the immune system to the infection. That clearly indicates that infection with SARS-CoV-2 results in methylation changes in blood cells that are involved in the physiological response to the infection. However, our benchmarking analyses showed that there is general lack of the coherence between results reported by the genome-wide studies aiming to identify methylation changes associated with severity of SARS-CoV-2 infection outcomes. The lack of coherence between studies is without doubt to a large extent attributed to different patient inclusion criteria, as well as to the use of different and study-specific severity assessment scales that we describe here. Also, lack of uniform data analysis workflow and correction of the statistical analysis for known confounders, such as cell composition, sex, age or ethnicity, that we identified in the analyzed studies, most likely contributed to the discrepant results. Nevertheless, the study that combined methylomics data from new patient cohort with data from previously published cohorts and re-analyzed combined data set with the uniform bioinformatics workflow identified a subset of methylation changes across independent and geographically distant patient cohorts. This shows that methylation changes that occur in blood of SARS-CoV-2-infected patients can be utilized as biomarkers of infection outcomes. And at the same time indicates that, out of all limitations that we identified in our benchmarking analysis, the uniform data analysis workflow appears to be most essential for the analysis of SARS-CoV-2-related methylation changes in blood cells and most likely other diseases.
Despite that, studies we analyzed, are lacking unified study design, SARS-CoV-2-related methylation changes reported show promise as potential biomarkers able to predict the severity and course of the infection, but further validation is required. The discrepancies we observed between studies are to a large extent consequence of the lack of the benchmark for this type of studies and such a benchmark in the face of rapidly progressing pandemic was not developed. Our study shows that, systematic and unified approach both at the stage of patient enrollment and assessment of the symptoms, as well as data analysis, is essential in future studies aiming to assess the potential of already identified and new methylation changes as biomarkers suitable for clinical management of COVID-19 patients. Consequently, major points of the biomarker development workflow we describe here, could be used to guide future studies.
The datasets used and analyzed during the current study are available from the public repositories referenced in original papers included in this study.
- WHO:
-
World Health Organization
- COVID-19:
-
Coronavirus disease 2019
- SARS-CoV-2:
-
Severe acute respiratory syndrome coronavirus 2
- RR:
-
Relative risk
- SMD:
-
Standardized mean difference
- OR:
-
Odds ratio
- CI:
-
Confidence interval
- GWAS:
-
Genome-wide association studies
- PICOS:
-
Participants, intervention, context, outcome, study design
- real time RT-PCR:
-
Real-time reverse transcription polymerase chain reaction
- ICU:
-
Intensive care unit
- ECMO:
-
Extra corporeal membrane oxygenation
- MHC:
-
Major histocompatibility complex
- HLA:
-
Human leukocyte antigen
- UV:
-
Ultraviolet
- VIS:
-
Visible
- PBMC:
-
Peripheral blood mononuclear cells
- BMIQ:
-
Bet a-mixture quantile normalization
- SQN:
-
Stratified quantile normalization
- DMP:
-
Differentially methylated position
- DMR:
-
Differentially methylated region
- MSP:
-
Methyl-specific polymerase chain reaction, COVID-GRAM: COVID-19 Global Risk Assessment Model
- ARDS:
-
Acute respiratory distress syndrome
Not applicable.
This study was funded by Polish Returns grant program from Polish National Agency for Academic Exchange, grant ID: PPN/PPO/2018/1/00088/U and OPUS22 grant from National Science Centre, grant ID: 2021/43/B/NZ2/02979.
Not applicable.
Not applicable.
The authors declare no competing interests.
The manuscript does not describe clinical trial.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Below is the link to the electronic supplementary material.
Supplementary Material 1: Additional file 1: Contains a flow diagram summarizing used search strategy and a detailed description of study design
Supplementary Material 2: Additional File 2: Contains Tables with Illumina EPIC manifest annotations of differentially methylated positions associated with severity in analyzed studies
Supplementary Material 3: Additional File 3: Contains Table 1 with the lists of differentially methylated positions identified in analyzed studies
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Przybylowicz, P.K., Klepinowski, T., Taryma-Leśniak, O. et al. Evaluation of methylation changes in blood cells of COVID-19 patients as a biomarker of severity of the infection. BMC Infect Dis 25, 778 (2025). https://doi.org/10.1186/s12879-025-11181-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12879-025-11181-1