首页> 美国卫生研究院文献>BMC Bioinformatics >Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles
【2h】

Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles

机译:利用可能阳性和未标记的数据来改进蛋白质相互作用的鉴定

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

BackgroundExperimentally verified protein-protein interactions (PPI) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be made faster by ranking newly-published articles' relevance to PPI, a task which we approach here by designing a machine-learning-based PPI classifier. All classifiers require labeled data, and the more labeled data available, the more reliable they become. Although many PPI databases with large numbers of labeled articles are available, incorporating these databases into the base training data may actually reduce classification performance since the supplementary databases may not annotate exactly the same PPI types as the base training data. Our first goal in this paper is to find a method of selecting likely positive data from such supplementary databases. Only extracting likely positive data, however, will bias the classification model unless sufficient negative data is also added. Unfortunately, negative data is very hard to obtain because there are no resources that compile such information. Therefore, our second aim is to select such negative data from unlabeled PubMed data. Thirdly, we explore how to exploit these likely positive and negative data. And lastly, we look at the somewhat unrelated question of which term-weighting scheme is most effective for identifying PPI-related articles.
机译:背景技术经过实验验证的蛋白质-蛋白质相互作用(PPI)很难被研究人员检索,除非它们存储在PPI数据库中。通过对新发表的文章与PPI的相关性进行排名,可以更快地管理此类数据库,我们在此通过设计基于机器学习的PPI分类器来解决这一任务。所有分类器都需要标记数据,并且可用的标记数据越多,它们就越可靠。尽管可以使用许多带有大量标记商品的PPI数据库,但是将这些数据库合并到基础训练数据中实际上可能会降低分类性能,因为补充数据库可能未注释与基础训练数据完全相同的PPI类型。本文的首要目标是找到一种从此类补充数据库中选择可能的阳性数据的方法。但是,仅提取可能的正数据会偏向分类模型,除非还添加了足够的负数据。不幸的是,很难获得负面数据,因为没有资源可以汇编此类信息。因此,我们的第二个目标是从未标记的PubMed数据中选择此类阴性数据。第三,我们探索如何利用这些可能的正面和负面数据。最后,我们来看一个无关紧要的问题,即哪种术语加权方案最有效地识别与PPI相关的文章。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号