首页> 美国卫生研究院文献>Oncotarget >DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest
【2h】

DHSpred: support-vector-machine-based human DNase I hypersensitive sites prediction using the optimal features selected by random forest

机译:DHSpred:使用随机森林选择的最佳功能基于支持向量机的人DNase I超敏部位预测

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

DNase I hypersensitive sites (DHSs) are genomic regions that provide important information regarding the presence of transcriptional regulatory elements and the state of chromatin. Therefore, identifying DHSs in uncharacterized DNA sequences is crucial for understanding their biological functions and mechanisms. Although many experimental methods have been proposed to identify DHSs, they have proven to be expensive for genome-wide application. Therefore, it is necessary to develop computational methods for DHS prediction. In this study, we proposed a support vector machine (SVM)-based method for predicting DHSs, called DHSpred (DNase I Hypersensitive Site predictor in human DNA sequences), which was trained with 174 optimal features. The optimal combination of features was identified from a large set that included nucleotide composition and di- and trinucleotide physicochemical properties, using a random forest algorithm. DHSpred achieved a Matthews correlation coefficient and accuracy of 0.660 and 0.871, respectively, which were 3% higher than those of control SVM predictors trained with non-optimized features, indicating the efficiency of the feature selection method. Furthermore, the performance of DHSpred was superior to that of state-of-the-art predictors. An online prediction server has been developed to assist the scientific community, and is freely available at:
机译:DNase I超敏位点(DHS)是基因组区域,可提供有关转录调控元件的存在和染色质状态的重要信息。因此,鉴定未表征的DNA序列中的DHS对于了解其生物学功能和机制至关重要。尽管已经提出了许多鉴定DHS的实验方法,但事实证明,它们对于全基因组应用来说是昂贵的。因此,有必要开发用于DHS预测的计算方法。在这项研究中,我们提出了一种基于支持向量机(SVM)的DHS预测方法,称为DHSpred(人DNA序列中的DNase I超敏位点预测因子),它具有174种最佳功能。使用随机森林算法,从包括核苷酸组成以及二核苷酸和三核苷酸的理化特性在内的一大套中确定了特征的最佳组合。 DHSpred的Matthews相关系数和准确度分别为0.660和0.871,这比使用非优化特征训练的控制SVM预测变量高3%,表明特征选择方法的效率。此外,DHSpred的性能要优于最新的预测器。已经开发了一个在线预测服务器来协助科学界,该服务器可从以下位置免费获得:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号