...
首页> 外文期刊>Journal of Bioinformatics and Computational Biology >Discovering novel protein-protein interactions by measuring the protein semantic similarity from the biomedical literature
【24h】

Discovering novel protein-protein interactions by measuring the protein semantic similarity from the biomedical literature

机译:通过测量生物医学文献中的蛋白质语义相似性来发现新型的蛋白质-蛋白质相互作用

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Protein-protein interactions (PPIs) are involved in the majority of biological processes. Identification of PPIs is therefore one of the key aims of biological research. Although there are many databases of PPIs, many other unidentified PPIs could be buried in the biomedical literature. Therefore, automated identification of PPIs from biomedical literature repositories could be used to discover otherwise hidden interactions. Search engines, such as Google, have been successfully applied to measure the relatedness among words. Inspired by such approaches, we propose a novel method to identify PPIs through semantic similarity measures among protein mentions. We define six semantic similarity measures as features based on the page counts retrieved from the MEDLINE database. A machine learning classifier, Random Forest, is trained using the above features. The proposed approach achieve an averaged micro-F of 71.28% and an averaged macro-F of 64.03% over five PPI corpora, an improvement over the results of using only the conventional co-occurrence feature (averaged micro-F of 68.79% and an averaged macro-F of 60.49%). A relation-word reinforcement further improves the averaged micro-F to 71.3% and averaged macro-F to 65.12%. Comparing the results of the current work with other studies on the AIMed corpus (ranging from 77.58% to 85.1% in micro-F, 62.18% to 76.27% in macro-F), we show that the proposed approach achieves micro-F of 81.88% and macro-F of 64.01% without the use of sophisticated feature extraction. Finally, we manually examine the newly discovered PPI pairs based on a literature review, and the results suggest that our approach could extract novel protein-protein interactions.
机译:蛋白质-蛋白质相互作用(PPI)参与大多数生物过程。因此,PPI的鉴定是生物学研究的主要目标之一。尽管有许多PPI的数据库,但许多其他未识别的PPI可能被埋在生物医学文献中。因此,从生物医学文献库中自动识别PPI可以用来发现其他隐藏的相互作用。搜索引擎(例如Google)已成功应用于度量单词之间的相关性。受此类方法的启发,我们提出了一种通过蛋白质提及中的语义相似性度量来识别PPI的新颖方法。基于从MEDLINE数据库检索的页数,我们定义了六个语义相似性度量作为功能。使用以上功能训练了机器学习分类器Random Forest。所提出的方法在五个PPI语料库上实现了平均Micro-F为71.28%和平均macro-F为64.03%,相对于仅使用常规共现功能(平均Micro-F为68.79%和平均宏F为60.49%)。关系词增强进一步将平均微F提升至71.3%,将平均宏F提升至65.12%。将当前工作的结果与其他有关AIMed语料库的研究(在micro-F中从77.58%到85.1%,在macro-F中从62.18%到76.27%)进行比较,我们表明,所提出的方法实现了81.88的micro-F %和64.01%的宏F,无需使用复杂的特征提取。最后,我们根据文献综述手动检查了新发现的PPI对,结果表明我们的方法可以提取新颖的蛋白质-蛋白质相互作用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号