首页> 外文会议>The 7th Asia-Pacific Bioinformatics Conference(第七届亚太生物信息学大会) >Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting
【24h】

Finding motif pairs in the interactions between heterogeneous proteins via bootstrapping and boosting

机译:通过自举和加强在异质蛋白之间的相互作用中寻找基序对

获取原文

摘要

Background: Supervised learning and many stochastic methods for predicting protein-protein interactions require both negative and positive interactions in the training data set. Unlike positive interactions, negative interactions cannot be readily obtained from interaction data,so these must be generated. In protein-protein interactions and other molecular interactions as well, taking all non-positive interactions as negative interactions produces too many negative interactions for the positive interactions. Random selection from non-positive interactions is unsuitable, since the selected data may not reflect the original distribution of data.Results: We developed a bootstrapping algorithm for generating a negative data set of arbitrary size from protein-protein interaction data. We also developed an efficient boosting algorithm for finding interacting motif pairs in human and virus proteins. The boosting algorithm showed the best performance (84.4% sensitivity and 75.9% specificity) with balanced positive and negative data sets. The boosting algorithm was also used to find potential motif pairs in complexes of human virus proteins, for which structural data was not used to train the algorithm. Interacting motif pairs common to multiple folds of structural data for the complexes were proven to be statistically significant. The data set for interactions between human and virus proteins was extracted from BOND and is available at http://virus.hpid.org/interactions.aspx. The complexes of human and virus proteins were extracted from PDB and their identifiers are available at http://virus.hpid.org/PDB_IDs.html.Conclusions: When the positive and negative training data sets are unbalanced, the result via the prediction model tends to be biased. Bootstrapping is effective for generating a negative data set, for which the size and distribution are easily controlled. Our boosting algorithm could efficiently predict interacting motif pairs from protein interaction and sequence data,which was trained with the balanced data sets generated via the bootstrapping method.
机译:背景:有监督的学习和许多用于预测蛋白质相互作用的随机方法需要训练数据集中的负向和正向相互作用。与正面互动不同,负面互动无法轻松地从互动数据中获得,因此必须生成这些互动。同样在蛋白质-蛋白质相互作用和其他分子相互作用中,将所有非正性相互作用视为负性相互作用会为正性相互作用产生太多的负性相互作用。由于选择的数据可能无法反映数据的原始分布,因此不宜从非阳性相互作用中进行随机选择。结果:我们开发了一种自举算法,用于从蛋白质-蛋白质相互作用数据中生成任意大小的负数据集。我们还开发了一种有效的增强算法,用于发现人和病毒蛋白中相互作用的基序对。增强算法在平衡的正负数据集下显示出最佳性能(灵敏度为84.4%,特异性为75.9%)。增强算法还用于在人类病毒蛋白复合物中寻找潜在的基序对,而结构数据并未用于训练该算法。复杂的结构数据的多重折叠共有的相互作用基序对被证明具有统计学意义。人与病毒蛋白之间相互作用的数据集是从BOND提取的,可从http://virus.hpid.org/interactions.aspx获得。人类和病毒蛋白的复合物是从PDB中提取的,其标识符可从http://virus.hpid.org/PDB_IDs.html获得。结论:当正负训练数据集不平衡时,通过预测模型得到的结果倾向于有偏见。引导程序对于生成负数集很有效,因为负数集的大小和分布易于控制。我们的增强算法可以从蛋白质相互作用和序列数据中有效预测相互作用的基序对,并通过自举法生成的平衡数据集对其进行训练。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号