首页> 外文会议>IEEE International Conference on Bioinformatics and Biomedicine >Machine Learning Approach to Assign Protein Secondary Structure Elements from Cα Trace
【24h】

Machine Learning Approach to Assign Protein Secondary Structure Elements from Cα Trace

机译:机器学习方法从Cα轨迹分配蛋白质二级结构元素

获取原文

摘要

Secondary structure elements in protein molecules refer to local sub-conformational regions stabilized by hydrogen bonding. Secondary structure elements can be divided into helical, sheet, or loop. Secondary structure elements bolster the folding and topology of the protein. They are important for modern structural bioinformatics such as protein modeling and functional analysis. Therefore, assigning the types of secondary structures in proteins is crucial. Many methods have been developed to address the problem. Methods can be categorized into two approaches. One approach uses the information about hydrogen bonding and energy while the other approach uses protein trace geometry. If the information of some atoms is missing, the second approach is more feasible. In this paper, we develop a machine learning method that belongs to the second approach to assign secondary structure elements. We develop a 3-state machine learning classifier. The classifier uses protein's Ca information only. The classifier ensembles four (4) machine learning models: Random Forest, Support Vector Machine, Multilayer Perceptron, and eXtreme Gradient Boosting. The classifier is trained with 600K amino acids. We tested our classifier at two different data sets. One data set contains 150K amino acids. The accuracy of our system was 94.6%. In addition, the classifier was tested on a set of 20 protein structures and compared with PCASSO from the same category. The information from Protein Data Bank was used as a reference. The comparison shows that our method can produce assignments that are more aligned with PDB at 93% accuracy while PCASSO achieved 84% accuracy.
机译:蛋白质分子中的二级结构元素是指通过氢键稳定的局部亚构象区域。二次结构元件可以分为螺旋,薄片或环。二级结构元素撑起蛋白质的折叠和拓扑。它们对于现代结构生物信息学等重要的蛋白质建模和功能分析很重要。因此,分配蛋白质中的二级结构类型至关重要。已经开发了许多方法来解决问题。方法可以分为两种方法。一种方法使用关于氢键和能量的信息,而其他方法使用蛋白质痕量几何形状。如果缺少一些原子的信息,则第二种方法更加可行。在本文中,我们开发了一种机器学习方法,属于第二种方法来分配次要结构元素。我们开发了一个三州机器学习分类器。分类器仅使用蛋白质的CA信息。分类器合并四(4)机器学习模型:随机森林,支持向量机,多层射击和极端渐变升压。分类器培训,用600K氨基酸培训。我们在两个不同的数据集中测试了我们的分类器。一个数据集包含150K氨基酸。我们系统的准确性为94.6%。此外,分类器在一组20个蛋白质结构上进行测试,并与来自相同类别的PCASSO进行比较。蛋白质数据库的信息被用作参考。比较表明,我们的方法可以产生与PDB更准确的分配,精度为93%,而PCASSO可实现84%的精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号