首页> 外文OA文献 >SINGLE-SEQUENCE PROTEIN SECONDARY STRUCTURE PREDICTION BY NEAREST-NEIGHBOR CLASSIFICATION OF PROTEIN WORDS
【2h】

SINGLE-SEQUENCE PROTEIN SECONDARY STRUCTURE PREDICTION BY NEAREST-NEIGHBOR CLASSIFICATION OF PROTEIN WORDS

机译:基于蛋白质词的近邻分类法的单序列蛋白质二级结构预测

摘要

Predicting protein secondary structure is the process by which, given audsequence of amino acids as input, the secondary structure class of eachudposition in the sequence is predicted. Our approach is built on the extractionudof protein words of a fixed length from protein sequences, followed byudnearest-neighbor classification in order to predict the secondary structureudclass of the middle position in each word. We present a new formulation forudlearning a distance function on protein words based on position-dependentudsubstitution scores on amino acids. These substitution scores are learnedudby solving a large linear programming problem on examples of wordsudwith known secondary structures. We evaluated this approach by using auddatabase of 5519 proteins with a total amino acid length of approximatelyud3000000. Presently, a test scheme using words of length 23 achieved auduniform average over word position of 65.2%. The average accuracy forudalpha-classified words in the test was 63.1%, for beta-classified words wasud56.6%, and for coil classified words was 71.6%.
机译:预测蛋白质的二级结构是这样的过程,在该过程中,给定氨基酸的过量输入,可以预测序列中每个叠加的二级结构类别。我们的方法建立在从蛋白质序列中提取固定长度的蛋白质单词 udof,然后进行 udnearest-neighbor分类,以预测每个单词中间位置的二级结构 udclass。我们提出了一种新的公式,用于基于氨基酸的位置依赖 udsubstitution分数学习蛋白质单词的距离函数。通过解决具有已知二级结构的单词示例上的大型线性规划问题,可以学习这些替代分数。我们通过使用5519种蛋白质的uddatabase评估了该方法,其总氨基酸长度约为ud3000000。目前,使用长度为23的单词的测试方案在单词位置上实现了65.2%的均匀平均。测试中 udalpha分类单词的平均准确度为63.1%,beta分类单词的平均准确性为 ud56.6%,线圈分类单词的平均准确度为71.6%。

著录项

  • 作者

    PORFIRIO DAVID JONATHAN;

  • 作者单位
  • 年度 2016
  • 总页数
  • 原文格式 PDF
  • 正文语种 en_US
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号