首页> 外文OA文献 >A novel structure-based encoding for machine-learning applied to the inference of SH3 domain specificity
【2h】

A novel structure-based encoding for machine-learning applied to the inference of SH3 domain specificity

机译:一种新颖的基于机器学习的基于结构的编码,可用于SH3域特异性的推断

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

MOTIVATION: Unravelling the rules underlying protein-protein and protein-ligand interactions is a crucial step in understanding cell machinery. Peptide recognition modules (PRMs) are globular protein domains which focus their binding targets on short protein sequences and play a key role in the frame of protein-protein interactions. High-throughput techniques permit the whole proteome scanning of each domain, but they are characterized by a high incidence of false positives. In this context, there is a pressing need for the development of in silico experiments to validate experimental results and of computational tools for the inference of domain-peptide interactions. RESULTS: We focused on the SH3 domain family and developed a machine-learning approach for inferring interaction specificity. SH3 domains are well-studied PRMs which typically bind proline-rich short sequences characterized by the PxxP consensus. The binding information is known to be held in the conformation of the domain surface and in the short sequence of the peptide. Our method relies on interaction data from high-throughput techniques and benefits from the integration of sequence and structure data of the interacting partners. Here, we propose a novel encoding technique aimed at representing binding information on the basis of the domain-peptide contact residues in complexes of known structure. Remarkably, the new encoding requires few variables to represent an interaction, thus avoiding the 'curse of dimension'. Our results display an accuracy >90% in detecting new binders of known SH3 domains, thus outperforming neural models on standard binary encodings, profile methods and recent statistical predictors. The method, moreover, shows a generalization capability, inferring specificity of unknown SH3 domains displaying some degree of similarity with the known data.
机译:动机:弄清蛋白质-蛋白质和蛋白质-配体相互作用的基本规则是理解细胞机制的关键步骤。肽识别模块(PRM)是球形蛋白质结构域,其结合目标集中在短蛋白质序列上,并在蛋白质-蛋白质相互作用的框架中发挥关键作用。高通量技术允许对每个域进行整个蛋白质组扫描,但是它们的特征是误报率很高。在这种情况下,迫切需要开发计算机模拟实验以验证实验结果以及用于推断域-肽相互作用的计算工具。结果:我们专注于SH3域家族,并开发了一种机器学习方法来推断相互作用的特异性。 SH3结构域是经过充分研究的PRM,通常结合以PxxP共有为特征的富含脯氨酸的短序列。已知结合信息被保持在结构域表面的构象和肽的短序列中。我们的方法依赖于高通量技术的交互数据,并受益于交互伙伴的序列和结构数据的集成。在这里,我们提出了一种新颖的编码技术,旨在基于已知结构的复合物中的域-肽接触残基来表示结合信息。值得注意的是,新的编码需要很少的变量来表示交互,从而避免了“维数的诅咒”。我们的结果表明,在检测已知SH3域的新结合物时,其准确度> 90%,因此在标准二进制编码,配置文件方法和最新的统计预测指标上均优于神经模型。此外,该方法还具有泛化能力,可以推断未知SH3域的特异性,从而与已知数据具有某种程度的相似性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号