首页> 美国卫生研究院文献>Scientific Reports >A random forest learning assisted divide and conquer approach for peptide conformation search
【2h】

A random forest learning assisted divide and conquer approach for peptide conformation search

机译:随机森林学习辅助分而治之的肽构象搜索方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Computational determination of peptide conformations is challenging as it is a problem of finding minima in a high-dimensional space. The “divide and conquer” approach is promising for reliably reducing the search space size. A random forest learning model is proposed here to expand the scope of applicability of the “divide and conquer” approach. A random forest classification algorithm is used to characterize the distributions of the backbone φ-ψ units (“words”). A random forest supervised learning model is developed to analyze the combinations of the φ-ψ units (“grammar”). It is found that amino acid residues may be grouped as equivalent “words”, while the φ-ψ combinations in low-energy peptide conformations follow a distinct “grammar”. The finding of equivalent words empowers the “divide and conquer” method with the flexibility of fragment substitution. The learnt grammar is used to improve the efficiency of the “divide and conquer” method by removing unfavorable φ-ψ combinations without the need of dedicated human effort. The machine learning assisted search method is illustrated by efficiently searching the conformations of GGG/AAA/GGGG/AAAA/GGGGG through assembling the structures of GFG/GFGG. Moreover, the computational cost of the new method is shown to increase rather slowly with the peptide length.
机译:肽构象的计算确定具有挑战性,因为这是在高维空间中找到极小值的问题。 “分而治之”的方法有望可靠地减小搜索空间的大小。这里提出一个随机森林学习模型,以扩大“分而治之”方法的适用范围。使用随机森林分类算法来表征主干φ-ψ单位(“单词”)的分布。开发了随机森林监督学习模型来分析φ-ψ单位(“语法”)的组合。发现氨基酸残基可以归类为等价的“单词”,而低能肽构象中的φ-ψ组合遵循不同的“语法”。等价词的发现使“分而治之”的方法具有片段替换的灵活性。学到的语法可用于通过消除不利的φ-ψ组合来提高“分而治之”方法的效率,而无需付出人工的努力。通过组装GFG / GFGG的结构有效搜索GGG / AAA / GGGG / AAAA / GGGGG的构象,说明了机器学习辅助搜索方法。此外,新方法的计算成本显示随着肽长度的增加而相当缓慢。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号