...
首页> 外文期刊>Mathematical Biosciences: An International Journal >SVM classification of human intergenic and gene sequences
【24h】

SVM classification of human intergenic and gene sequences

机译:人类基因间和基因序列的SVM分类

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Despite constant improvement in prediction accuracy, gene-finding programs are still unable to provide automatic gene discovery with the desired correctness. This paper presents an analysis of gene and intergenic sequences from the point of view of language analysis, where gene and intergenic regions are regarded as two different subjects written in the four-letter alphabet {A,C,G,T}, and high frequency simple sequences are taken as keywords. A measurement alpha(l(tau)) was introduced to describe the relative repeat ratio of simple sequences. Threshold values were found for keyword selections. After eliminating 'noise', 178 short sequences were selected as keywords. DNA sequences are mapped to 178-dimensional Euclidean space, and SVM was used for prediction of gene regions. We showed by cross-validation that the program we developed could predict 93% of gene sequences with 7% false positives. When tested on a long genomic multi-gene sequence, our method improved nucleotide level specificity by 21%, and over 60% of predicted genes corresponded to actual genes. (c) 2005 Elsevier Inc. All rights reserved.
机译:尽管预测准确性不断提高,但是基因发现程序仍无法提供具有所需正确性的自动基因发现。本文从语言分析的角度介绍了基因和基因间序列的分析,其中基因和基因间区域被认为是两个不同的主题,用四个字母的字母{A,C,G,T}书写,并且高频简单序列用作关键字。引入了测量值alpha(l(tau))来描述简单序列的相对重复率。找到用于关键字选择的阈值。消除“噪音”后,选择了178个短序列作为关键字。 DNA序列被映射到178维欧几里得空间,并且SVM用于预测基因区域。通过交叉验证,我们证明了我们开发的程序可以预测93%的基因序列和7%的假阳性。当在长基因组多基因序列上进行测试时,我们的方法将核苷酸水平的特异性提高了21%,并且超过60%的预测基因与实际基因相对应。 (c)2005 Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号