...
首页> 外文期刊>Proteins: Structure, Function, and Genetics >A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison
【24h】

A machine learning approach for ranking clusters of docked protein‐protein complexes by pairwise cluster comparison

机译:通过成对簇比较进行对接蛋白蛋白复合物簇的机器学习方法

获取原文
获取原文并翻译 | 示例

摘要

ABSTRACT Reliable identification of near‐native poses of docked protein–protein complexes is still an unsolved problem. The intrinsic heterogeneity of protein–protein interactions is challenging for traditional biophysical or knowledge based potentials and the identification of many false positive binding sites is not unusual. Often, ranking protocols are based on initial clustering of docked poses followed by the application of an energy function to rank each cluster according to its lowest energy member. Here, we present an approach of cluster ranking based not only on one molecular descriptor (e.g., an energy function) but also employing a large number of descriptors that are integrated in a machine learning model, whereby, an extremely randomized tree classifier based on 109 molecular descriptors is trained. The protocol is based on first locally enriching clusters with additional poses, the clusters are then characterized using features describing the distribution of molecular descriptors within the cluster, which are combined into a pairwise cluster comparison model to discriminate near‐native from incorrect clusters. The results show that our approach is able to identify clusters containing near‐native protein–protein complexes. In addition, we present an analysis of the descriptors with respect to their power to discriminate near native from incorrect clusters and how data transformations and recursive feature elimination can improve the ranking performance. Proteins 2017; 85:528–543. ? 2016 Wiley Periodicals, Inc.
机译:摘要对停靠蛋白质复合物的近乎原生姿势的可靠鉴定仍然是一个未解决的问题。蛋白质 - 蛋白质相互作用的内在异质性是对传统的生物物理学或知识的潜力挑战,并且许多假阳性结合位点的鉴定并不罕见。通常,排名协议基于停靠的姿势的初始聚类,然后应用于能量函数来根据其最低能量构件对每个簇进行排序。这里,我们不仅基于一个分子描述符(例如,能量函数)而且采用集成在机器学习模型中的大量描述符,其中基于109的机器学习模型中的大量描述符的方法分子描述符接受培训。该协议基于第一局部富集的簇具有附加姿势,然后使用描述集群内的分子描述符分布的特征来表征集群,这些特征在于将其组合成比对比较模型以区分来自不正确的簇的近似原产。结果表明,我们的方法能够识别含有近天然蛋白质蛋白质复合物的簇。此外,我们对其权力呈现了描述符的分析,以区分来自不正确的群集的近似,以及数据转换和递归特征消除如何提高排名性能。蛋白质2017; 85:528-543。还2016 Wiley期刊,Inc。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号