首页> 外文会议> >Adapting support vector machines to predict translation initiation sites in the human genome
【24h】

Adapting support vector machines to predict translation initiation sites in the human genome

机译:调整支持向量机以预测人类基因组中的翻译起始位点

获取原文

摘要

This study is concerned with predicting translation initiation sites (TIS) in the human genome that start with the nucleotide sequence ATG. This sequence occurs 104 million times in the entire genome. However, current estimates predict that there are only about 30,000 or so TIS in the human genome, giving an imbalance ratio of about 1:3500 for TIS ATG vs. non-TIS ATG sites. Algorithms that are designed using datasets that have low imbalance ratio may not be well suited to predict TIS at the genomic level. In this paper, we modified the SVM algorithm that can handle moderately high imbalance ratio. The F-measures for other approaches were: linear discriminant 0%, SVM with under-sampling 4.1%, SVM with over-sampling 8.2%, neural network 13.3%, decision tree 20%, our approach 44%. This shows how poorly standard approaches perform at the genomic level due to the high imbalance ratio. Our approach improves the performance significantly.
机译:这项研究与预测以核苷酸序列ATG开头的人类基因组中的翻译起始位点(TIS)有关。该序列在整个基因组中发生1.04亿次。但是,目前的估计预测,人类基因组中只有大约30,000左右的TIS,TIS ATG与非TIS ATG位点的不平衡比约为1:3500。使用不平衡率低的数据集设计的算法可能不太适合在基因组水平上预测TIS。在本文中,我们修改了支持中等适度不平衡比的SVM算法。其他方法的F度量是:线性判别0%,欠采样的SVM为4.1%,过采样的SVM为8.2%,神经网络为13.3%,决策树为20%,我们的方法为44%。这表明由于不平衡率高,标准方法在基因组水平上的表现不佳。我们的方法可以显着提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号