Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Technical Report
  • Published:

A computational solution for bolstering reliability of epigenetic clocks: implications for clinical trials and longitudinal tracking

Abstract

Epigenetic clocks are widely used aging biomarkers calculated from DNA methylation data, but this data can be surprisingly unreliable. Here we show that technical noise produces deviations up to 9 years between replicates for six prominent epigenetic clocks, limiting their utility. We present a computational solution to bolster reliability, calculating principal components (PCs) from CpG-level data as input for biological age prediction. Our retrained PC versions of six clocks show agreement between most replicates within 1.5 years, improved detection of clock associations and intervention effects, and reliable longitudinal trajectories in vivo and in vitro. This method entails only one additional step compared to traditional clocks, requires no replicates or previous knowledge of CpG reliabilities for training, and can be applied to any existing or future epigenetic biomarker. The high reliability of PC-based clocks is critical for applications to personalized medicine, longitudinal tracking, in vitro studies and clinical trials of aging interventions.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Low reliability of CpGs reduces reliability of epigenetic age prediction.
Fig. 2: Epigenetic clocks trained from principal components.
Fig. 3: Epigenetic clocks trained from principal components are highly reliable.
Fig. 4: Information requirements for age and mortality prediction.
Fig. 5: Principal-component clocks are reliable in saliva and brain.
Fig. 6: Principal-component clocks preserve relevant aging and mortality signals.
Fig. 7: Principal-component clocks show trajectories with improved stability in longitudinal data.
Fig. 8: Principal-component clocks reduce sample size requirements for clinical trials and in vitro assays.

Similar content being viewed by others

Data availability

Most datasets used in this study are publicly available on NCBI’s GEO, ArrayExpress or TCGA and are listed in Supplementary Table 6 along with accession codes. HRS data contain sensitive health information, and are available by application to researchers at https://hrsdata.isr.umich.edu/. FHS data contain sensitive health information, and researchers can apply at https://dbgap.ncbi.nlm.nih.gov/aa/ (dbGaP, accession no. phs000724.v7.p11). InCHIANTI data contain sensitive health information and are available upon review and subsequent approval of proposals submitted through the study website (http://inchiantistudy.net/). The Elysium datasets are proprietary and owned by Elysium Health, and inquiries about the data can be made to research@elysiumhealth.com. Owing to military cohort data sharing restrictions, data from the PRISMO study cannot be publicly posted. However, such data may be made available to researchers following an approved analysis proposal and in a de-identified form through a data use agreement following applicable guidelines on data sharing and privacy protection. For additional information on access to these data, please contact s.g.geuze@umcutrecht.nl. Longitudinal clozapine data contain sensitive health information, and researchers can inquire about access to the data by contacting j.luykx@umcutrecht.nl. SATSA methylation data are available on ArrayExpress (accession code E-MTAB-7309). For information on access to additional subject-level SATSA data, please contact sara.hagg@ki.se.

Code availability

Code to calculate or train PC clocks is available at https://github.com/MorganLevineLab/PC-Clocks/.

References

  1. Jylhävä, J., Pedersen, N. L. & Hägg, S. Biological age predictors. EBioMedicine 21, 29–36 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  2. Bell, C. G. et al. DNA methylation aging clocks: challenges and recommendations. Genome Biol. 20, 249 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Horvath, S. & Raj, K. DNA methylation-based biomarkers and the epigenetic clock theory of ageing. Nat. Rev. Genet. 19, 371–384 (2018).

    Article  CAS  PubMed  Google Scholar 

  4. Sugden, K. et al. Patterns of reliability: assessing the reproducibility and integrity of DNA methylation measurement. Patterns 1, 100014 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Logue, M. W. et al. The correlation of methylation levels measured using Illumina 450K and EPIC BeadChips in blood samples. Epigenomics 9, 1363–1371 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Bose, M. et al. Evaluation of microarray-based DNA methylation measurement using technical replicates: The atherosclerosis risk in communities (ARIC) study. BMC Bioinformatics 15, 312 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. Naeem, H. et al. Reducing the risk of false discovery enabling identification of biologically significant genome-wide methylation status using the HumanMethylation450 array. BMC Genomics 15, 51 (2014).

  8. Pidsley, R. et al. Critical evaluation of the Illumina MethylationEPIC BeadChip microarray for whole-genome DNA methylation profiling. Genome Biol. 17, 1–17 (2016).

    Article  CAS  Google Scholar 

  9. Lehne, B. et al. A coherent approach for analysis of the Illumina HumanMethylation450 BeadChip improves data quality and performance in epigenome-wide association studies. Genome Biol. 16, 1–12 (2015).

    Article  CAS  Google Scholar 

  10. Morris, T. J. & Beck, S. Analysis pipelines and packages for Infinium HumanMethylation450 BeadChip (450K) data. Methods 72, 3–8 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. McEwen, L. M. et al. Systematic evaluation of DNA methylation age estimation with common preprocessing methods and the Infinium MethylationEPIC BeadChip array. Clin. Epigenetics 10, 123 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Liu, Z. et al. Underlying features of epigenetic aging clocks in vivo and in vitro. Aging Cell https://doi.org/10.1111/acel.13229 (2020).

  13. Koo, T. K. & Li, M. Y. A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J. Chiropr. Med. 15, 155–163 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Horvath, S. DNA methylation age of human tissues and cell types. Genome Biol. 14, R115 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Horvath, S. et al. Epigenetic clock for skin and blood cells applied to Hutchinson Gilford Progeria Syndrome and ex vivo studies. Aging 10, 1758–1775 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Hannum, G. et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol. Cell 49, 359–367 (2013).

    Article  CAS  PubMed  Google Scholar 

  17. Levine, M. et al. An epigenetic biomarker of aging for lifespan and healthspan. Aging 10, 573–591 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Lu, A. T. et al. DNA methylation-based estimator of telomere length. Aging 11, 5895–5923 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Bocklandt, S. et al. Epigenetic predictor of age. PLoS ONE 6, e14821 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Teschendorff, A. E. A comparison of epigenetic mitotic-like clocks for cancer risk prediction. Genome Med. 12, 56 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Youn, A. & Wang, S. The MiAge Calculator: a DNA methylation-based mitotic age calculator of human tissue types. Epigenetics 13, 192–206 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Belsky, D. et al. Quantification of the pace of biological aging in humans through a blood test: a DNA methylation algorithm. Elife https://doi.org/10.1101/2020.02.05.927434 (2020).

  23. McCartney, D. et al. Epigenetic prediction of complex traits and death. Genome Biol. 19, 136 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Houseman, E. A. et al. DNA methylation arrays as surrogate measures of cell mixture distribution. BMC Bioinformatics 13, 86 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Zhang, Y. et al. DNA methylation signatures in peripheral blood strongly predict all-cause mortality. Nat. Commun. 8, 14617 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Lu, A. T. et al. DNA methylation GrimAge strongly predicts lifespan and healthspan. Aging 11, 303–327 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Lin, Q. & Wagner, W. Epigenetic aging signatures are coherently modified in cancer. PLoS Genet. 11, 1–17 (2015).

    Article  Google Scholar 

  28. Weidner, C. I. et al. Aging of blood can be tracked by DNA methylation changes at just three CpG sites. Genome Biol. 15, R24 (2014).

  29. Vidal-Bralo, L., Lopez-Golan, Y. & Gonzalez, A. Simplified assay for epigenetic age estimation in whole blood of adults. Front. Genet. 7, 126 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Garagnani, P. et al. Methylation of ELOVL2 gene as a new epigenetic marker of age. Aging Cell 11, 1132–1134 (2012).

    Article  CAS  PubMed  Google Scholar 

  31. Higgins-Chen, A. T., Thrush, K. L. & Levine, M. E. Aging biomarkers and the brain. Semin. Cell Dev. Biol. 116, 180–193 (2021).

    Article  CAS  PubMed  Google Scholar 

  32. Jolliffe, I. T. A note on the use of principal components in regression. J. R. Stat. Soc. Ser. C Appl. Stat. 31, 300–303 (1982).

  33. Yan, Y., Goodman, J. M., Moore, D. D., Solla, S. A. & Bensmaia, S. J. Unexpected complexity of everyday manual behaviors. Nat. Commun. 11, 3564 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Aschard, H. et al. Maximizing the power of principal-component analysis of correlated phenotypes in genome-wide association studies. Am. J. Hum. Genet. 94, 662–676 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Tarashansky, A. J., Xue, Y., Li, P., Quake, S. R. & Wang, B. Self-assembling manifolds in single-cell RNA-sequencing data. Elife 8, 1–e48994 (2019).

    Article  Google Scholar 

  36. Pidsley, R. et al. A data-driven approach to preprocessing Illumina 450K methylation array data. BMC Genomics 14, 293 (2013).

  37. Sturm, G. et al. A multi-omics and bioenergetics longitudinal aging dataset in primary human fibroblasts with mitochondrial perturbations. Preprint at bioRxiv https://doi.org/10.1101/2021.11.12.468448 (2021).

  38. Li, X. et al. Longitudinal trajectories, correlations and mortality associations of nine biological ages across 20-years follow-up. Elife 9, e51507 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. Wang, Y. et al. Epigenetic influences on aging: a longitudinal genome-wide methylation study in old Swedish twins. Epigenetics 13, 975–987 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Bakdash, J. Z. & Marusich, L. R. Repeated-measures correlation. Front. Psychol. 8, 456 (2017).

  41. Liu, G. & Liang, K.-Y. Sample size calculations for studies with correlated observations. Biometrics 53, 937–947 (1997).

    Article  CAS  PubMed  Google Scholar 

  42. Wagner, W. The link between epigenetic clocks for aging and senescence. Front. Genet. 10, 303 (2019).

  43. Itahana, K., Campisi, J. & Dimri, G. P. Mechanisms of cellular senescence in human and mouse cells. Biogerontology 5, 1–10 (2004).

    Article  CAS  PubMed  Google Scholar 

  44. Chen, H., Li, Y. & Tollefsbol, T. O. Cell senescence culturing methods. Methods Mol. Biol. https://doi.org/10.1007/978-1-62703-556-9_1 (2013).

  45. Oblak, L., van der Zaag, J., Higgins-Chen, A. T., Levine, M. E. & Boks, M. P. A systematic review of biological, social and environmental factors associated with epigenetic clock acceleration. Ageing Res. Rev. 69, 101348 (2021).

    Article  CAS  PubMed  Google Scholar 

  46. Chen, L. et al. Effects of Vitamin D3 supplementation on epigenetic aging in overweight and obese african americans with suboptimal vitamin D status: a randomized clinical trial. J. Gerontol. A Biol. Sci. Med. Sci. 74, 91–98 (2019).

    Article  CAS  PubMed  Google Scholar 

  47. Fahy, G. M. et al. Reversal of epigenetic aging and immunosenescent trends in humans. Aging Cell 18, e13028 (2019).

  48. Fitzgerald, K. N. et al. Potential reversal of epigenetic age using a diet and lifestyle intervention: a pilot randomized clinical trial. Aging 13, 9419–9432 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Field, A. E. et al. DNA methylation clocks in aging: categories, causes and consequences. Mol. Cell 71, 882–895 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Raj, K. & Horvath, S. Current perspectives on the cellular and molecular features of epigenetic ageing. Exp. Biol. Med. 245, 1532–1542 (2020).

    Article  CAS  Google Scholar 

  51. Robinson, O. et al. Determinants of accelerated metabolomic and epigenetic ageing in a UK cohort. Aging Cell https://doi.org/10.1111/acel.13149 (2020).

  52. Aryee, M. J. et al. Minfi: a flexible and comprehensive Bioconductor package for the analysis of Infinium DNA methylation microarrays. Bioinformatics 30, 1363–1369 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Heiss, J. A. & Just, A. C. Improved filtering of DNA methylation microarray data by detection P values and its impact on downstream analyses. Clin. Epigenetics 11, 15 (2019).

  54. Ferrucci, L. et al. Subsystems contributing to the decline in ability to walk: bridging the gap between epidemiology and geriatric practice in the InCHIANTI study. J. Am. Geriatr. Soc. 48, 1618–1625 (2000).

    Article  CAS  PubMed  Google Scholar 

  55. Moore, A. Z. et al. Change in epigenome-wide DNA methylation over 9 years and subsequent mortality: results From the InCHIANTI Study. J. Gerontol. A Biol. Sci. Med. Sci. 71, 1029–1035 (2016).

    Article  CAS  PubMed  Google Scholar 

  56. Crimmins, E. M., Thyagarajan, B., Levine, M. E., Weir, D. R. & Faul, J. Associations of age, sex, race/ethnicity and education with 13 epigenetic clocks in a nationally representative US sample: The Health and Retirement Study. J. Gerontol. A. Biol. Sci. Med. Sci. 76, 1117–1123 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  57. Kannel, W. B., Feinleib, M., McNamara, P. M., Garrison, R. J. & Castelli, W. P. An investigation of coronary heart disease in families: The Framingham Offspring Study. Am. J. Epidemiol. 110, 281–290 (1979).

    Article  CAS  PubMed  Google Scholar 

  58. Splansky, G. L. et al. The Third Generation Cohort of the National Heart, Lung, and Blood Institute’s Framingham Heart Study: design, recruitment, and initial examination. Am. J. Epidemiol. 165, 1328–1335 (2007).

    Article  PubMed  Google Scholar 

  59. Finkel, D. & Pedersen, N. L. Processing speed and longitudinal trajectories of change for cognitive abilities: The Swedish Adoption/Twin Study of Aging. Aging Neuropsychol. Cogn. 11, 325–345 (2004).

    Article  Google Scholar 

  60. van der Wal, S. J. et al. Associations between the development of PTSD symptoms and longitudinal changes in the DNA methylome of deployed military servicemen: a comparison with polygenic risk scores. Compr. Psychoneuroendocrinology 4, 100018 (2020).

    Article  Google Scholar 

  61. Van Der Wal, S. J., Gorter, R., Reijnen, A., Geuze, E. & Vermetten, E. Cohort profile: the Prospective Research in Stress-Related Military Operations (PRISMO) study in the Dutch Armed Forces. BMJ Open 9, 1–e026670 (2019).

    Google Scholar 

  62. Higgins-Chen, A. T., Boks, M. P., Vinkers, C. H., Kahn, R. S. & Levine, M. E. Schizophrenia and epigenetic aging biomarkers: increased mortality, reduced cancer risk and unique clozapine effects. Biol. Psychiatry https://doi.org/10.1016/j.biopsych.2020.01.025 (2020).

  63. Levine, M. E., Higgins-Chen, A., Thrush, K., Minteer, C. & Niimi, P. Clock work: deconstructing the epigenetic clock signals in aging, disease and reprogramming. Preprint at bioRxiv https://doi.org/10.1101/2022.02.13.480245 (2022).

  64. Triche, T. J., Weisenberger, D. J., Van Den Berg, D., Laird, P. W. & Siegmund, K. D. Low-level processing of Illumina Infinium DNA methylation BeadArrays. Nucleic Acids Res. 41, e90 (2013).

  65. Daniali, L. et al. Telomeres shorten at equivalent rates in somatic tissues of adults. Nat. Commun. 4, 1597 (2013).

    Article  PubMed  CAS  Google Scholar 

  66. Zhuang, J., Widschwendter, M. & Teschendorff, A. E. A comparison of feature selection and classification methods in DNA methylation studies using the Illumina Infinium platform. BMC Bioinformatics 13, 59 (2012).

  67. Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Bair, E. & Tibshirani, R. Semi-supervised methods to predict patient survival from gene expression data. PLoS Biol. 2, e108 (2004).

  69. Bair, E., Hastie, T., Paul, D. & Tibshirani, R. Prediction by supervised principal components. J. Am. Stat. Assoc. 101, 119–137 (2006).

    Article  CAS  Google Scholar 

  70. Kuznetsova, A., Brockhoff, P. B. & Christensen, R. H. B. lmerTest Package: tests in linear mixed effects models. J. Stat. Softw. 82, 1–26 (2017).

Download references

Acknowledgements

This work was supported by the National Institutes of Health (NIH, National Institute on Aging (NIA): 1R01AG068285-01, 1R01AG065403-01A1 and 1R01AG057912-01 to M.E.L.) and National Institute of Mental Health (2T32MH019961-21A1 to A.H.C.). It was also supported by the Thomas P. Detre Fellowship Award in Translational Neuroscience Research from Yale University (to A.H.C.) and the Medical Informatics Fellowship Program at the West Haven, CT Veterans Healthcare Administration (to A.H.C.). The InCHIANTI study baseline (1998–2000) was supported as a ‘targeted project’ (ICS110.1/RF97.71) by the Italian Ministry of Health and in part by the US NIA (contract nos. 263 MD 9164 and 263 MD 821336). The InCHIANTI follow-up 2 and 3 studies (2004–2010) were financed by the US NIA (contract nos. N01-AG-5-0002). InCHIANTI was supported in part by the Intramural Research Program of the NIA, NIH, Baltimore, Maryland, and this work utilized the computational resources of the NIH HPC Biowulf cluster (https://hpc.nih.gov/). The HRS study was supported by NIA grants R01 AG060110 and U01 AG009740. The SATSA study was supported by NIH grants R01 (AG04563, AG10175 and AG028555), the MacArthur Foundation Research Network on Successful Aging, the European Union’s Horizon 2020 research and innovation programme (no. 634821), the Swedish Council for Working Life and Social Research (FAS/FORTE) (97:0147:1B, 2009-0795 and 2013-2292) and the Swedish Research Council (825-2007-7460, 825-2009-6141, 521-2013-8689 and 2015-03255). The recruitment and assessments in the PRISMO study were funded by the Dutch Ministry of Defence. The longitudinal clozapine study was funded by a personal Rudolf Magnus Talent Fellowship (H150) grant (to J.J.L.). The Cellular Lifespan Study was supported by NIA grant R01AG066828 (to M.P.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We also acknowledge S. Horvath, A. Lu, G. Hannum and the many other colleagues who developed the original epigenetic clocks analyzed in this study.

Author information

Authors and Affiliations

Authors

Contributions

A.T.H.-C. and M.E.L. conceived the project and study design. A.T.H.-C., K.L.T., Y.W., M.W., T.T.H.-S. and M.E.L. performed reliability and PC clock analyses. A.T.H.-C. and P.-L.K. performed power analyses. C.M. and P.N. performed cultured astrocyte experiments. G.S., J.L. and M.P. performed DNAm and telomere length assessments for the Cellular Lifespan Study. Other authors contributed data and analyses related to InCHIANTI (P.K., A.Z.M., S.B. and L.F.), HRS (E.M.C. and M.E.L.), SATSA (Y.W. and S.H.), PRISMO (C.H.V., E.V., B.P.R., E.G. and M.P.B.) or longitudinal clozapine (C.O.-P., M.Z.H., S.S., S.G. and J.J.L.) studies. All authors reviewed and contributed to the manuscript.

Corresponding authors

Correspondence to Albert T. Higgins-Chen or Morgan E. Levine.

Ethics declarations

Competing interests

M.E.L. and A.T.H.-C. have built epigenetic aging metrics involving the technology described in the present paper, and these metrics are licensed by Elysium Health through Yale University. Elysium provided paired blood and saliva replicate datasets reported in this study, but otherwise did not fund the study and did not play a role in conceptualization, design, decision to publish or preparation of the manuscript. M.E.L. previously acted as a Scientific Advisor for, and received consulting fees from, Elysium Health. T.H.S. was previously an employee of Elysium Health. A.T.H.-C. received consulting fees from FOXO Technologies for work unrelated to the present manuscript. All other authors declare no competing interests.

Peer review

Peer review information

Nature Aging thanks Andrew Teschendorff, Daniel Belsky and Joris Deelen for their contribution to the peer review for this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Additional reliability information about clock CpGs.

a-f, Reliability, age correlation, and mortality information for M-values from all clocks and β-values from individual clocks, similar to Fig. 1b-f. ICCs are quantified across 36 samples with 2 technical replicates each. Blood age correlations were calculated in GSE40279. Mortality associations (hazard ratios for 1 SD change in β or M value) were calculated in FHS (n = 3935 with 319 deaths). Shown are histograms of ICC of clock CpGs (a), agreement of technical replicates for CpG values where each point represents one pair of replicates for one CpG (b), and comparisons of ICC values to mean values, standard deviations, age correlations, and mortality associations where each point is one CpG (c-f). g-h, Comparison of M-value and β-value ICCs. Correlation test p-value is based on Student’s t distribution (two-tailed). i, Correlation plot for epigenetic age differences between replicates. Epigenetic age replicate differences were calculated for each clock separately, then the differences were correlated with each other and with age and sex. Data is reported as correlation (p-value). Correlation test p-value is based on Student’s t distribution (two-tailed).

Source data

Extended Data Fig. 2 Contributions of CpG deviations to clock deviations between replicates.

a, Contribution of each CpG to overall clock measured in years (except DNAmTL which is measured in base pairs), calculated as weight in clock multiplied by 1 SD in beta value in GSE55763. Each point represents one CpG. b, Correlation of each CpG’s deviation with clock deviation between replicates. Each point represents one CpG. c, Deviation of each CpG multiplied by the CpG weight. Each point represents one CpG for one pair of replicates. d-h, Heatmap of clock deviations attributable to each CpG (CpG deviation multiplied by CpG weight in clock), separated by sample. Rows are CpGs and columns are samples. Clock deviations are measured in years (except DNAmTL which is measured in base pairs).

Source data

Extended Data Fig. 3 Many CpGs show associations with age and mortality that could be used by clocks.

a, Filtering out CpGs by ICC leads to modest improvements in clock reliability. PhenoAge has a low ICC yet high mortality prediction, and thus we tested whether ICC could be improved without jeopardizing the latter. 100 models with ICC cutoff 0-0.99 were generated to predict PhenoAge in InCHIANTI when limiting CpGs to those above the ICC cutoff. The resulting epigenetic age ICCs (calculated in 36 pairs of technical replicates) and mortality prediction in test data (n = 3935 with 319 deaths) were visualized. b, Similar to a, except using a random CpG subset selection with an equivalent number of CpGs. c, Volcano plots showing the age associations in blood (GSE40279; 450K array). Red indicates CpGs present in any of 18 existing clocks. Significance was assessed with a two-sided t-test, and the dotted line indicates genome-wide significance calculated by Bonferroni correction (p = 1.057 ×10−7). d, ICCs for 78,464 CpGs present across all datasets and the 450K and EPIC arrays, listed in Supplementary Table 6. ICCs were calculated in 36 pairs of technical replicates. e-f, Age and mortality correlations for CpG ICCs for selected 78,464 CpGs. Age correlation was calculated in GSE40279, and mortality hazard ratio was calculated in the Framingham Heart Study after adjusting for age and sex. g, Comparison of the 78,464 CpG ICCs to previously published ICC values. Lehne 2015: 450K array, age range 37.3-74.6. Bose 2014: 450K array, age range 45-64. Sugden 2020: 450K and EPIC, age range 18-18. Logue 2018: EPIC array, mean age 31.8 and SD 8.4. Since Bose 2014 published ICCs with floor value of 0, we changed all Lehne 2015 CpGs with ICC < 0 to ICC = 0 to make comparisons consistent. For Sugden 2020 or Logue 2018, we adjusted the floor to −0.3 for presentation purposes. Correlation test p-value is based on Student’s t distribution (two-tailed).

Source data

Extended Data Fig. 4 Additional reliability data on PC clocks in blood.

a, Reliability of GrimAge and PCGrimAge components calculated using 36 pairs of technical replicates (GSE55763). Data are presented as ICC estimates with 95% confidence interval. b, Reliability of epigenetic age and age acceleration in an independent blood DNAm dataset with 37 pairs of technical replicates (Elysium Dataset 1). Data are presented as ICC estimates with 95% confidence interval. c, PC clocks allow for correction for systemic offsets in epigenetic age across batches. Epigenetic age acceleration is shown for 8 individuals with 18 measurements (across 3 batches, 2 scans, and 3 replicates per batch) in Elysium Dataset 2.

Source data

Extended Data Fig. 5 Enhanced reliability of PC clocks does not depend on new training data.

a-b, Age acceleration ICC and replicate differences (n = 36 pairs of technical replicates) for Horvath1, Horvath2, and PhenoAge in blood trained using new data (including substitute datasets). Data are presented as ICC estimate with 95% confidence interval. c-d, Same as a-b, for cerebellum (n = 34 pairs of technical replicates). Data are presented as ICC estimate with 95% confidence interval. e-f, Age acceleration reliability in GSE55763 (n = 36 pairs of technical replicates) and mortality prediction in FHS (n = 3935 with 319 deaths) for variations of PhenoAge (e) and Hannum (f) calculated using different CpG sets, sample sizes, and different methods (elastic net, ridge regression, supervised PCA, PC clocks). Data are presented as ICC or HR (1 SD change) estimates with 95% confidence interval. g, PCs from one dataset can be projected to a second dataset for elastic net regression and used to construct reliable PC clocks. PCA was performed in the Hannum GSE40279 dataset then projected to the PhenoAge HRS/InCHIANTI dataset for elastic net regression, and vice versa. These “borrowed” PCs could still be used to reliable age predictors. We plotted age acceleration reliability in GSE55763 (n = 36 pairs of technical replicates) and mortality prediction in FHS (n = 3935 with 319 deaths). Data are presented as ICC or HR (1 SD change) estimates with 95% confidence interval.

Source data

Extended Data Fig. 6 Contribution of CpGs and PCs to PC clocks.

a, The effect of a 1 SD change in beta for each CpG on the PC clocks. This was calculated by multiplying the CpG loadings for each PC by the PC weight in the clock, summing these products for each CpG, and multiplying by CpG standard deviation from the GSE55763. Effects are shown on a log base 10 scale. Note that results were similar using standard deviations from the PC clock training data. CpGs present in the original clock are denoted in red. b, Effect of 1 SD change in PC score for each PC on the overall clock. c, Cumulative sum of 1 SD changes in PC scores for each PC (black), plotted against cumulative variance explained for each PC in the original training data (grey).

Extended Data Fig. 7 Low-variance PCs capture aging heterogeneity in physiological systems.

a, Scree plots showing variance explained by PC for PCPhenoAge in training data (black) compared to variance explained for a randomized matrix of the same size as PCPhenoAge training data (red), for the top 150 PCs (split into two graphs for visualization purposes). b-c, Number of new driver CpGs introduced by each PC for all PCs (b) and PCs included in the model (c). d, Cumulative variance plot for PCPhenoAge. e, Plot showing significant univariate linear associations between PhenoAge components and PCPhenoAge PCs, with PCs ordered from highest to lowest variance explained. These were not adjusted for multiple testing as the PCs are meant to be combined by elastic net regression. For d and e, the horizontal lines delineate the selected cutoffs for high-, medium-, and low-variance PCs. f-g, Histograms of the association significance for selected PCPhenoAge PCs (f) and unselected PCs (g), with values reported as -log10(p-value), with significance determined by two-sided t-test, not adjusted for multiple testing. Vertical lines denote p = 0.05. For each PC, we selected the most significant p-value out of the 10 PhenoAge components. h-i, PCPhenoAge was divided into components corresponding to the signal from high-, medium-, and low-variance PCs in both HRS training data (h) and FHS test data (i). Multivariate associations between biomarkers and disease status are shown. Biomarkers were standardized (Z-scores) and modeled using linear regression. Disease status was binary and modeled with logistic regression. PCPhenoAge components were in units of 1 year. For example, a 1-year increase in PCPhenoAge due to medium-variance PCs was associated with a 0.1 SD increase in creatinine in training data and a 0.06 SD increase in test data. Non-significant correlations are denoted by “X”. j, Mortality hazard ratios for a 1-year change in PCPhenoAge components from high-, medium-, and low-variance PCs are shown (n = 3935 with 319 deaths). Data are presented as HR estimate with 95% confidence interval.

Source data

Extended Data Fig. 8 PC clocks show improved agreement in cerebellum technical replicates and increased stability in longitudinal blood DNAm data.

a, Ridge plot demonstrating the distributions of clock values for cerebellum technical replicates (GSE43414). b, Biweight midcorrelation between longitudinal changes in clocks for SATSA. c, Repeated measures correlations in longitudinal change in clocks for clozapine dataset. d, Short-term longitudinal blood DNAm data was measured with up to 300 days follow-up after initiation of clozapine. Each line shows the trajectory of an individual’s epigenetic age relative to their baseline during the follow-up period.

Source data

Extended Data Fig. 9 PC clocks allow for correction for short-term cell composition shifts.

a, Repeated measures correlations in longitudinal change in clocks for PRISMO dataset. b, Short-term longitudinal blood DNAm data was measured with up to 500 days follow-up in the PRISMO dataset. Each line shows the trajectory of an individual’s epigenetic age relative to their baseline during the follow-up period. Cell-adjusted trajectories were adjusted based on proportions of 5 cell types imputed from DNAm data most correlated with the clocks (granulocytes, plasmablasts, B, CD4T, and CD8T cells). c, Power analysis for a trial evaluating an intervention in a young population to protect from stress-induced pathological aging, based on parameters estimated from the PRISMO study. The red line indicates epigenetic age adjusted for longitudinal changes in granulocytes, plasmablasts, B, CD4T, and CD8T cells.

Source data

Supplementary information

Supplementary Information

Supplementary Results

Reporting Summary

Supplementary Table 1

Supplementary Tables 1–13

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data.

Source Data Fig. 7

Statistical source data.

Source Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Source Data Extended Data Fig. 7

Statistical source data.

Source Data Extended Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 9

Statistical source data.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Higgins-Chen, A.T., Thrush, K.L., Wang, Y. et al. A computational solution for bolstering reliability of epigenetic clocks: implications for clinical trials and longitudinal tracking. Nat Aging 2, 644–661 (2022). https://doi.org/10.1038/s43587-022-00248-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43587-022-00248-2

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing