首页> 外文OA文献 >Discovering variable-length patterns in protein sequences for protein-protein interaction prediction
【2h】

Discovering variable-length patterns in protein sequences for protein-protein interaction prediction

机译:在蛋白质序列中发现可变长度模式以进行蛋白质相互作用预测

摘要

To predict Protein-Protein Interactions (PPIs), there have recently been some attempts to use computational approaches and among them, sequence-based approaches are often preferred over other kinds of approaches as they do not require prior knowledge about proteins to perform their tasks. However, in deciding if two proteins may interact with each other, existing sequence-based approaches consider only fixed-length segments. We believe that if segments of variable-length can also be considered, interactions between proteins can be more accurately predicted. To consider variable-length segments for PPI predictions, we have developed a VLASPD algorithm. Given a database of protein sequences, VLASPD performs its tasks in several steps. The protein database is first searched to identify frequent sequence segments (FSSs) of different length. The different combinations of the presence and absence of these FSSs are then used to form different associative sequential patterns (ASPs). Based on a statistical measure, the ASPs that occur significantly frequently among proteins in the training set are then identified as significant associative sequential patterns (SASPs). If an SASP is found in a protein pair, it can be considered as providing some evidence to support or refute the existence of an interaction relationship between the protein pairs. The amount of evidence provided are then quantified with an information theoretic measure. How likely two proteins may interact with each other are then decided by the total amount of evidence provided by the SASPs found in the protein pairs. To test the effectiveness of VLASPD, we used several sets of real data. The experimental results show that VLASPD can be a promising approach for PPI prediction. The VLASPD is made available for use and testing at http://www.comp.polyu.edu.hk/~cslhu/resources/vlaspd/.
机译:为了预测蛋白质-蛋白质相互作用(PPI),最近进行了一些尝试使用计算方法,其中基于序列的方法通常比其他类型的方法更可取,因为它们不需要有关蛋白质的先验知识即可执行其任务。但是,在确定两种蛋白质是否可以相互作用时,现有的基于序列的方法仅考虑固定长度的片段。我们相信,如果还可以考虑可变长度的片段,则可以更准确地预测蛋白质之间的相互作用。为了考虑用于PPI预测的可变长度段,我们开发了VLASPD算法。给定蛋白质序列数据库,VLASPD分几个步骤执行其任务。首先搜索蛋白质数据库以识别不同长度的频繁序列片段(FSS)。然后,将这些FSS存在和不存在的不同组合用于形成不同的关联顺序模式(ASP)。基于统计量度,然后将训练集中蛋白质中频繁出现的ASP识别为重要的关联顺序模式(SASP)。如果在蛋白质对中发现SASP,则可以认为它提供了一些证据来支持或驳斥蛋白质对之间存在相互作用关系。然后使用信息理论方法对提供的证据数量进行量化。然后,由蛋白质对中发现的SASP提供的证据总量决定两种蛋白质相互影响的可能性。为了测试VLASPD的有效性,我们使用了几组真实数据。实验结果表明,VLASPD可以作为预测PPI的有前途的方法。可在http://www.comp.polyu.edu.hk/~cslhu/resources/vlaspd/上使用和测试VLASPD。

著录项

  • 作者

    Hu L; Chan KCC;

  • 作者单位
  • 年度 2015
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号