首页> 外文会议>8th World Multi-Conference on Systemics, Cybernetics and Informatics(SCI 2004) vol.13: Industrial Systems >Sequence/Structure Similarity and Support Vector Machine for Protein Secondary Structure Prediction
【24h】

Sequence/Structure Similarity and Support Vector Machine for Protein Secondary Structure Prediction

机译:蛋白质二级结构预测的序列/结构相似性和支持向量机

获取原文
获取原文并翻译 | 示例

摘要

The majority of human coding regions have been sequenced and several genome sequencing projects have been completed. With large-scale of sequencing data growth, an efficient approach to analysis protein is more important. Protein function and structure are foundations for drug design and protein based product. However, it's difficult to predict protein function and structure (three-dimension) directly from protein (amino acids) sequence. Therefore, analyzing protein secondary structure is indispensable. In the previous work, researchers always focused on classifying three states of protein secondary structure) helix, strand and coil classes. It's a common classification problem for the prediction of protein secondary structure. Comparing with other machine learning methods for this problem, many studies usually ignore the protein local sequence/structure properties. It concerns the accuracy of prediction because there exists a large number of proteins that are homologous but whose sequences are only remotely related. In this paper, we propose to use sequence similarity and Support Vector Machines (SVMs) to predict protein secondary structure. First, we adopt RS126 and CB513 as experiment dataset. In this process, we try to encode the amino acids sequences and transform sequence segments into vectors for training. Second, we construct the SVM classifiers for classifying each residue of each sequence into the S secondary structure classes (i.e. H, E, or C). SVM has been successfully applied in pattern recognition problem. SVMs are learning systems that use a hypothesis space of linear functions in a high dimensional feature space, trained with a learning algorithm from optimisation theory that implements a learning bias derived from statistical learning theory. It's very suitable to compute with large-scale protein sequences. We have a better accuracy than traditional machine learning methods for protein secondary prediction.
机译:大多数人类编码区已经测序,一些基因组测序项目已经完成。随着大规模测序数据的增长,一种有效的蛋白质分析方法变得更加重要。蛋白质的功能和结构是药物设计和蛋白质基产品的基础。但是,很难直接根据蛋白质(氨基酸)序列预测蛋白质功能和结构(三维)。因此,分析蛋白质的二级结构是必不可少的。在先前的工作中,研究人员始终专注于对蛋白质二级结构的三种状态(螺旋,链和线圈类)进行分类。这是预测蛋白质二级结构的常见分类问题。与针对该问题的其他机器学习方法相比,许多研究通常会忽略蛋白质的局部序列/结构特性。它涉及预测的准确性,因为存在大量同源但序列仅远缘相关的蛋白质。在本文中,我们建议使用序列相似性和支持向量机(SVM)来预测蛋白质的二级结构。首先,我们采用RS126和CB513作为实验数据集。在这个过程中,我们尝试对氨基酸序列进行编码,并将序列片段转化为用于训练的载体。其次,我们构造SVM分类器,将每个序列的每个残基分类为S个二级结构类(即H,E或C)。支持向量机已成功应用于模式识别问题。 SVM是在高维特征空间中使用线性函数假设空间的学习系统,经过优化理论的学习算法进行训练,该算法实现了从统计学习理论得出的学习偏差。非常适合使用大规模蛋白质序列进行计算。对于蛋白质二级预测,我们比传统的机器学习方法具有更好的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号