首页> 外文会议>Annual International Conference of the IEEE Engineering in Medicine and Biology Society >Exploration of Unsupervised Feature Selection Methods to Predict Chronological Age of Individuals by Utilising CpG dinucleotics from Whole Blood
【24h】

Exploration of Unsupervised Feature Selection Methods to Predict Chronological Age of Individuals by Utilising CpG dinucleotics from Whole Blood

机译:通过利用全血CpG二核苷酸来预测无监督特征选择方法的探索预测个体的年龄

获取原文

摘要

Identification of the age of individuals from epigenetic biomarkers can reveal vital information for criminal investigation, disease prevention, and extension of life. DNA methylation changes are highly associated with chronological age and the process of disease development. Computational methods such as clustering, feature selection and regression can be utilised to construct quantitative model of aging. In this study, we utilised 473034 CpG biomarkers from whole blood of 656 individuals aged 19 to 101 to construct predictive models and we treat the development of this age predictive model as extremely high-dimensional regression problem that is relatively understudied. Unlike semi-supervised and supervised feature selection methods, unsupervised feature selection methods are generally good at removing irrelevant features that can act as noise. In this study, along with the entire feature set, four different unsupervised feature selection methods (USFSMs) are therefore considered for the quantitative prediction of human ages. Since USFSMs are independent of any predictive method, support vector regression is then used to evaluate the prediction performances of the unsupervised feature selection methods. We proposed a novel k-means based unsupervised feature selection method to predict human ages by utilising CpG dinucleotides. Experimental results have validated the effectiveness of the proposed method as the optimum number of the CpG dinucleotides is found to be only 41 that corresponds to only 0.0087% of the entire feature space. To the best of our knowledge, this is the first study that presents exploration and comprehensive comparison of USFSMs in very high dimensional regression problems, particularly in epigenetic biomedical domain for the prediction of chronological age from changes in DNA methylation.
机译:鉴定表观遗传生物标志物的个体年龄可以揭示刑事调查,疾病预防和延伸生命的重要信息。 DNA甲基化变化与年龄和疾病发展过程高度相关。可以利用群集,特征选择和回归等计算方法来构建老化的定量模型。在这项研究中,我们利用了来自19至101岁的656人的全血的473034个CPG生物标志物,以构建预测模型,并且我们将该年龄预测模型的发展视为相对理解的极高维度回归问题。与半监督和监督特征选择方法不同,无监督的特征选择方法通常擅长去除可以充当噪音的无关功能。在本研究中,随着整个特征集,因此考虑了四种不同的无监督特征选择方法(USFSMS),用于人类年龄的定量预测。由于USFSMS与任何预测方法无关,因此支持向量回归来评估无监督的特征选择方法的预测性能。我们提出了一种基于新的K-Mean,通过利用CpG二核苷酸来预测人们年龄。实验结果验证了所提出的方法的有效性,因为发现CpG二核苷酸的最佳数量仅为41,其仅对应于整个特征空间的0.0087%。据我们所知,这是第一项研究,介绍了勘探和全面比较USFSMS在非常高的维度回归问题中,特别是在表观遗传生物医学领域,用于预测从DNA甲基化的变化的年龄。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号