首页> 外文会议>Tools with Artificial Intelligence, 2009. ICTAI '09 >Gradient-Based Feature Selection for Conditional Random Fields and its Applications in Computational Genetics
【24h】

Gradient-Based Feature Selection for Conditional Random Fields and its Applications in Computational Genetics

机译:条件随机场基于梯度的特征选择及其在计算遗传学中的应用

获取原文

摘要

Gene prediction is one of the first and most important steps in understanding the genome of a species, and different approaches haven been proposed. In 2007, a de novo gene predictor, called CONTRAST, based on Conditional Random Fields (CRFs) is introduced, and proved to substantially outperform previous predictors. However, the oversize feature set used in the model has posed several issues, like overfitting problem and excessive computational demand. To resolve these issues, we did a thorough survey of two existing feature selection methods for CRFs, namely the gain-based and gradient-based methods, and applied the later one to CONTRAST. The results show that with the gradient-based feature selection scheme, we are able to achieve comparable or even better prediction accuracy on testing data, using only a very small fraction of the features from the candidate pool. The feature selection method also helps researchers better understand the underlying structure of the genomic sequences, further provides insights of the function and evolutionary dynamics of genomes.
机译:基因预测是了解物种基因组的首要步骤,也是最重要的步骤之一,目前已经提出了不同的方法。 2007年,引入了基于条件随机场(CRF)的从头基因预测子,称为CON​​TRAST,并证明其性能远胜于先前的预测子。但是,模型中使用的过大特征集带来了一些问题,例如过拟合问题和过多的计算需求。为了解决这些问题,我们对CRF的两种现有特征选择方法进行了全面研究,即基于增益的方法和基于梯度的方法,并将后一种方法应用于CONTRAST。结果表明,使用基于梯度的特征选择方案,我们仅使用候选池中很小一部分特征就可以在测试数据上实现相当甚至更好的预测精度。特征选择方法还可以帮助研究人员更好地了解基因组序列的基础结构,进一步提供有关基因组功能和进化动力学的见解。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号