首页> 外文会议>Annual International Conference of the IEEE Engineering in Medicine and Biology Society >An identification and prediction methods for feature-subsets of CpG islands methylation based on human peripheral blood leukocytes of chromosome 21q
【24h】

An identification and prediction methods for feature-subsets of CpG islands methylation based on human peripheral blood leukocytes of chromosome 21q

机译:基于染色体人周围血白细胞的CPG岛甲基化特征亚组的鉴定及预测方法

获取原文

摘要

The pace of technology has allowed classification of feature-subset of methylated and unmethylated of CpG islands of DNA sequence properties. As methylation of CpG islands is involved in various biological phenomena and function of the DNA methylation is correlated to various human diseases such as cancer, analysis of the CpG islands has become important and useful in characterizing and modelling biological phenomena and understanding mechanism of such diseases. However, analysis of the data associated with the CpG islands is a quite new and challenging subject in bioinformatics, systems biology and epigenetics. In this paper, the problems associated with prediction of methylated and unmethylated CpG islands on human chromosome 21q are addressed. In order to carry out the prediction, a data set of 132 samples of the CpG islands from human peripheral blood leukocytes of chromosomes 21q and 4 different feature sub-sets totalling 44 attributes that characterise the methylated and unmethylated groups is extracted for each sample. Due to the nature of this unbalanced data set, in order to avoid disadvantages of traditional leave-one-out (LOO) and m-fold cross validation methods, the LOO method is modified by incorporating the m-fold cross validation approach. In addition, K-nearest neighbour classifier is then adapted for the prediction. The results gained through 440 different comprehensive analyses shows that the methylated CpG islands can be distinguished from the unmethylated CpG islands by a predictive accuracy of between 75% and 80%. More importantly, the modified LOO identifies more clearly and reliably when the feature sub-sets are combined. Another interesting observation is that the modified-LOO-based analysis reveals that the CpGI-specific feature-set achieve the highest predictive accuracy when combined with the other feature sets, which is not the case in the traditional LOO. This also further supports the robustness of the modified-LOO cross validation appr- - oach as CpGI-specific feature-set is one of the most important and effective attributes shown in other studies.
机译:技术速度允许甲基化和未甲基化的CPG岛的特征子集分类,DNA序列性能。由于CpG岛的甲基化涉及各种生物现象,并且DNA甲基化的功能与癌症如癌症等各种人类疾病相关,CPG岛的分析变得重要,可用于表征生物现象和这种疾病的理解机制。然而,与CPG岛相关的数据分析是生物信息学,系统生物学和表观遗传学中的一个相当新的和具有挑战性的。在本文中,解决了与人染色体21Q上甲基化和未甲基化CpG岛预测相关的问题。为了执行预测,从染色体21Q的人外周血白细胞和4种不同特征子组的CPG岛的数据集132个样品总共44个属性,其表征甲基化和未甲基化基团的每个样品。由于这种不平衡数据集的性质,为了避免传统休假(LOO)和M折交叉验证方法的缺点,通过结合M折叠交叉验证方法来修改LOO方法。另外,然后适于k最近邻分类器的预测。通过440种不同的综合分析获得的结果表明,甲基化的CPG岛可以通过预测精度与未甲基化的CPG岛区分开,预测精度为75%和80%。更重要的是,当特征子集合组合时,修改的LOO更清晰可靠地识别。另一个有趣的观察是,基于修改的LOO的分析表明,当与其他特征集结合时,CPGI特定的特征集可以实现最高的预测精度,这不是传统厕所中的情况。这还进一步支持修改的LOO交叉验证验证验证 - - oach的鲁棒性,因为CpGi特定的功能集是其他研究中所示的最重要和有效的属性之一。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号