首页> 外文会议>Bioinformatics Research and Applications; Lecture Notes in Bioinformatics; 4463 >A Feature Selection Algorithm Based on Graph Theory and Random Forests for Protein Secondary Structure Prediction
【24h】

A Feature Selection Algorithm Based on Graph Theory and Random Forests for Protein Secondary Structure Prediction

机译:基于图论和随机森林的蛋白质二级结构特征选择算法

获取原文
获取原文并翻译 | 示例

摘要

Protein secondary structure prediction problem is one of the widely studied problems in bioinformatics. Predicting the secondary structure of a protein is an important step for determining its tertiary structure and thus its function. This paper explores the protein secondary structure problem using a novel feature selection algorithm combined with a machine learning approach based on random forests. For feature reduction, we propose an algorithm that uses a graph theoretical approach which finds cliques in the non-position specific evolutionary profiles of proteins obtained from BLOSUM62. Then, the features selected by this algorithm are used for condensing the position specific evolutionary information obtained from PSI-BLAST. Our results show that we are able to save significant amount of space and time and still achieve high accuracy results even when the features of the data are 25% reduced.
机译:蛋白质二级结构预测问题是生物信息学中广泛研究的问题之一。预测蛋白质的二级结构是确定其三级结构,进而确定其功能的重要步骤。本文使用一种新颖的特征选择算法,结合基于随机森林的机器学习方法,探索蛋白质二级结构问题。为了减少特征,我们提出了一种使用图论方法的算法,该算法在从BLOSUM62获得的蛋白质的非位置特定进化谱中发现分子簇。然后,使用该算法选择的特征来压缩从PSI-BLAST获得的特定于位置的进化信息。我们的结果表明,即使数据的特征减少了25%,我们也可以节省大量的空间和时间,并且仍然可以获得高精度的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号