首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >A multi-stage protein secondary structure prediction system using machine learning and information theory
【24h】

A multi-stage protein secondary structure prediction system using machine learning and information theory

机译:基于机器学习和信息论的多阶段蛋白质二级结构预测系统

获取原文

摘要

In this paper, we evaluated the performance of a multi-stage protein secondary structure (PSS) prediction model. The proposed classifier uses statistical information and protein profiles. The statistical information is derived from protein sequences and structures by using a k-means clustering technique and Information theory. In the first stage, a feed-forward artificial neural network maps a sequence fragment to a region in the Ramachandran plot (2D-plot). A score vector is constructed with the mapped region using clustering and statistical information. The score vector represents the tendency of pairing an identified region in the 2D-plot and secondary structures for a residue. The score vectors which are used in the second stage have fewer dimensions compared to input vectors that are commonly derived from protein sequences or profile information. In the second stage, a two-tier classifier is employed based on an artificial neural network and a genetic programming (GP) method. The GP method uses IF rules for a three-state classification. The two-tier classifier's performance is compared to those of two-tier artificial neural networks (ANNs) and support vector machines (SVMs). The prediction method is examined with a common protein dataset, RS126. The performance of the proposed classification model is measured based on Q3 and segment overlap (SOV) scores. The proposed PSS prediction model improves over 3% the Q3 score and 2% the SOV score in comparison to those of two-tier ANN and SVMs architectures.
机译:在本文中,我们评估了多阶段蛋白质二级结构(PSS)预测模型的性能。拟议的分类器使用统计信息和蛋白质概况。统计信息是通过使用k均值聚类技术和信息论从蛋白质序列和结构中得出的。在第一阶段,前馈人工神经网络将序列片段映射到Ramachandran图(2D图)中的区域。使用聚类和统计信息,使用映射的区域构造得分向量。得分矢量表示将2D图中已识别区域与残基的二级结构配对的趋势。与通常从蛋白质序列或谱图信息获得的输入向量相比,第二阶段使用的评分向量具有较小的维数。在第二阶段,基于人工神经网络和遗传编程(GP)方法采用两层分类器。 GP方法使用IF规则进行三态分类。将两层分类器的性能与两层人工神经网络(ANN)和支持向量机(SVM)的性能进行比较。使用通用蛋白质数据集RS126检查了预测方法。基于Q3和分段重叠(SOV)分数来衡量所提出分类模型的性能。与两层ANN和SVM架构相比,拟议的PSS预测模型将Q3得分提高了3%以上,SOV得分提高了2%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号