首页> 美国卫生研究院文献>International Journal of Molecular Sciences >A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties
【2h】

A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties

机译:具有DNA序列信息和理化性质的DNA甲基化位点检测的新方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods—especially machine learning methods—have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use k-gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria—area under the receiver operating characteristic curve (AUC), Matthew’s correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity—are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach 0.8632 on AUC, 0.8017 on ACC, 0.5558 on MCC, and 0.7268 on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were 0.8896 and 0.9511 by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least 0.0399. For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: .
机译:DNA甲基化是重要的生化过程,它与许多类型的癌症密切相关。关于DNA甲基化的研究可以帮助我们了解其调控机制和表观遗传重编程。因此,识别DNA序列中的甲基化位点变得非常重要。在过去的几十年中,自从高通量测序技术在研究和工业中广泛使用以来,已经开发了许多计算方法,尤其是机器学习方法。为了准确识别核苷酸残基是否在特定的DNA序列背景下被甲基化,我们提出了一种新颖的方法,该方法克服了先前预测甲基化位点的方法的缺点。我们使用k-gram,多元互信息,离散小波变换和伪氨基酸组成来提取特征,并训练稀疏贝叶斯学习模型来进行DNA甲基化预测。五个标准-接收器工作特性曲线(AUC)下的区域,马修的相关系数(MCC),准确性(ACC),灵敏度(SN)和特异性-用于评估我们方法的预测结果。在基准数据集上,AUC可以达到0.8632,ACC可以达到0.8017,MCC可以达到0.5558,SN可以达到0.7268。此外,在两个scBS-seq轮廓的小鼠胚胎干细胞数据集上,AUC的最佳结果分别为0.8896和0.9511。与其他出色的方法相比,我们的方法在预测准确性上超过了它们。与其他方法相比,我们的方法对AUC的改进至少为0.0399。为了方便其他研究人员,我们的代码已上传到文件托管服务,可以从以下网站下载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号