首页> 美国卫生研究院文献>other >Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA
【2h】

Critical evaluation of linear regression models for cell-subtype specific methylation signal from mixed blood cell DNA

机译:线性回归模型对混合血细胞DNA中细胞亚型特异性甲基化信号的关键评估

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Epigenome-wide association studies seek to identify DNA methylation sites associated with clinical outcomes. Difference in observed methylation between specific cell-subtypes is often of interest; however, available samples often comprise a mixture of cells. To date, cell-subtype estimates have been obtained from mixed-cell DNA data using linear regression models, but the accuracy of such estimates has not been critically assessed. We evaluated linear regression performance for cell-subtype specific methylation estimation using a 450K methylation array dataset of both mixed-cell and cell-subtype sorted samples from six healthy males. CpGs associated with each cell-subtype were first identified using t-tests between groups of cell-subtype sorted samples. Subsequent reduced panels of reliably accurate CpGs were identified from mixed-cell samples using an accuracy heuristic (D). Performance was assessed by comparing cell-subtype specific estimates from mixed-cells with corresponding cell-sorted mean using the mean absolute error (MAE) and the Coefficient of Determination (R2). At the cell-subtype level, methylation levels at 3272 CpGs could be estimated to within a MAE of 5% of the expected value. The cell-subtypes with the highest accuracy were CD56+ NK (R2 = 0.56) and CD8+T (R2 = 0.48), where 23% of sites were accurately estimated. Hierarchical clustering and pathways enrichment analysis confirmed the biological relevance of the panels. Our results suggest that linear regression for cell-subtype specific methylation estimation is accurate only for some cell-subtypes at a small fraction of cell-associated sites but may be applicable to EWASs of disease traits with a blood-based pathology. Although sample size was a limitation in this study, we suggest that alternative statistical methods will provide the greatest performance improvements.
机译:整个表观基因组关联研究旨在鉴定与临床结果相关的DNA甲基化位点。人们通常关注特定细胞亚型之间观察到的甲基化差异。但是,可用样品通常包含细胞混合物。迄今为止,已经使用线性回归模型从混合细胞DNA数据中获得了细胞亚型估计值,但尚未严格评估此类估计值的准确性。我们使用来自六个健康男性的混合细胞和细胞亚型分类样本的450K甲基化阵列数据集,针对细胞亚型特异性甲基化评估评估了线性回归性能。首先使用t检验在各组细胞亚型分类样本之间鉴定与每种细胞亚型相关的CpG。随后使用精确度启发式(D)从混合细胞样品中鉴定出了可靠的准确CpG的减少的组。通过使用平均绝对误差(MAE)和测定系数(R 2 )将混合细胞的亚型特异性估计值与相应的细胞分类平均值进行比较,来评估性能。在细胞亚型水平,可以估计3272 CpGs的甲基化水平在期望值的5%内。准确度最高的细胞亚型为CD56 + NK(R 2 = 0.56)和CD8 + T(R 2 < / sup> = 0.48),其中准确估算了23%的网站。层次聚类和途径富集分析证实了面板的生物学相关性。我们的结果表明,针对细胞亚型特异性甲基化估计的线性回归仅对一小部分细胞相关位点的某些细胞亚型是准确的,但可能适用于基于血液病理学的疾病特征的EWAS。尽管样本量是本研究的局限性,但我们建议替代的统计方法将提供最大的性能改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号