...
首页> 外文期刊>Analytical methods >The prediction of methylation states in human DNA sequences based on hexanucleotide composition and feature selection
【24h】

The prediction of methylation states in human DNA sequences based on hexanucleotide composition and feature selection

机译:基于六核苷酸组成和特征选择的人类DNA序列甲基化状态预测

获取原文

摘要

DNA methylation is an important epigenetic modification, and it plays a crucial role in the regulation of gene expression and the occurrence of cancer. Although various experimental methods have been used to detect DNA methylation states, they are time-consuming and laborious. With the rapid accumulation of DNA sequence data, the gap between the number of known sequences and the number of known methylation annotation is widening rapidly. Therefore, it is indispensable to develop a computational method for predicting methylation states. In this study, the hexanucleotide composition is utilized to characterize the DNA sequences. Maximum relevance minimum redundancy is adopted to preselect a feature subset with discrimination information, and an improved genetic algorithm is employed to obtain the optimal feature subset from the preselected feature subset and the parameters of the support vector machine. In the end, a model on the basis of the optimal feature subset and parameter is constructed and used to predict methylation states. Based on the 5-fold cross-validation, the proposed method achieves an accuracy of 92.42%, a Matthew's correlation coefficient of 0.8484 and the area under the receiver operating characteristic curve of 0.9326. The predictive performance of the hexanucleotide composition is evaluated by comparing with trinucleotide composition and nonanucleotide composition. The results indicate that the current method has a high potential to become a useful tool in DNA methylation states prediction research. The source code of Matlab is freely available on request from the authors.
机译:DNA甲基化是重要的表观遗传修饰,在基因表达的调控和癌症的发生中起着至关重要的作用。尽管已使用各种实验方法来检测DNA甲基化状态,但它们既费时又费力。随着DNA序列数据的迅速积累,已知序列数与已知甲基化注释数之间的差距正在迅速扩大。因此,开发预测甲基化状态的计算方法是必不可少的。在这项研究中,六核苷酸组合物用于表征DNA序列。采用最大关联度最小冗余度来预选具有识别信息的特征子集,并采用改进的遗传算法从预选特征子集和支持向量机的参数中获得最优特征子集。最后,基于最佳特征子集和参数构建模型,并将其用于预测甲基化状态。基于5倍交叉验证,该方法的准确度为92.42%,马修相关系数为0.8484,接收器工作特性曲线下的面积为0.9326。通过与三核苷酸组成和九核苷酸组成进行比较来评估六核苷酸组成的预测性能。结果表明,目前的方法具有很大的潜力,成为DNA甲基化状态预测研究中的有用工具。 Matlab的源代码可应作者的要求免费提供。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号