首页> 外文学位 >State-of-the-art protein secondary-structure prediction using a novel two-stage alignment and machine-learning method.
【24h】

State-of-the-art protein secondary-structure prediction using a novel two-stage alignment and machine-learning method.

机译:使用新型的两阶段比对和机器学习方法预测最先进的蛋白质二级结构。

获取原文
获取原文并翻译 | 示例

摘要

While the complexity of biological systems often appears intractable, living organisms possess an underlying correlation derived from their hierarchical association. This notion enables methods such as machine learning techniques, Bayesian statistics, nearest neighbor, and known sequence-to-structure exploration, to discover and predict biological patterns.;As proteins are the direct expression of DNA, they are the center of all biological activity. Thousands of new protein sequences are discovered each year, and knowledge of their biological importance relies on the determination of their folded or tertiary structure. Secondary structure prediction plays an important role in protein tertiary prediction, as well as in the characterization of general protein structure and function.;The protein secondary structure prediction problem is defined as a three-state classification problem. Given any linear sequence of one-letter coded amino acids, the goal is to predict the secondary structure membership of each amino acid.;Machine-learning based techniques are commonly and increasingly used for secondary structure prediction. For the past few decades, several algorithms and their variations have been used to predict protein secondary structure, including multi-layered neural networks and ensembles of support vector machines.;DARWIN is new protein secondary structure prediction server that utilizes a novel two-stage system that is unlike any current state-of-the-art method. DARWIN specifically responds to the issue of accuracy decline due to a lack of known homologous sequences, by balancing and maximizing PSI-BLAST information, by using a new method termed fixed-size fragment analysis (FFA), and by filling in gaps, ends, and missing information with an ensemble of support vector machines. DARWIN comprises a unique combination of homology consensus modeling, fragment consensus modeling, and support vector machine learning. DARWIN has been tested against several leading prediction servers and results show that DARWIN exceeds current state-of-the-art accuracy for all explored test sets.
机译:虽然生物系统的复杂性通常看起来很棘手,但活生物体具有源自其等级关联的潜在关联。这个概念使诸如机器学习技术,贝叶斯统计,最近邻和已知的序列到结构探索之类的方法能够发现和预测生物学模式。由于蛋白质是DNA的直接表达,它们是所有生物学活动的中心。每年发现成千上万的新蛋白质序列,其生物学重要性的知识取决于其折叠或三级结构的确定。二级结构预测在蛋白质三级预测以及一般蛋白质结构和功能的表征中起着重要作用。蛋白质二级结构预测问题被定义为三态分类问题。给定一个字母编码的氨基酸的任何线性序列,目标是预测每种氨基酸的二级结构成员。基于机器学习的技术已普遍且越来越多地用于二级结构预测。在过去的几十年中,已经使用了几种算法及其变体来预测蛋白质二级结构,包括多层神经网络和支持向量机的集合。DARWIN是利用新型两阶段系统的新型蛋白质二级结构预测服务器这不同于任何当前的最新方法。 DARWIN通过平衡和最大化PSI-BLAST信息,使用称为固定大小片段分析(FFA)的新方法并填补空白,末端,来专门解决由于缺乏已知同源序列而导致的准确性下降的问题,支持向量机的集成导致信息丢失和丢失。 DARWIN由同源性共识模型,片段共识模型和支持向量机学习的独特组合组成。 DARWIN已针对几种领先的预测服务器进行了测试,结果表明DARWIN超过了所有探索的测试集的当前最先进的准确性。

著录项

  • 作者

    Gates, Ami M.;

  • 作者单位

    University of Florida.;

  • 授予单位 University of Florida.;
  • 学科 Computer Science.;Biology Bioinformatics.
  • 学位 Ph.D.
  • 年度 2008
  • 页码 113 p.
  • 总页数 113
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号