首页> 美国卫生研究院文献>BMC Bioinformatics >Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles

【2h】

Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles

机译：利用可能阳性和未标记的数据来改进蛋白质相互作用的鉴定

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

页面导航

摘要
著录项
相似文献
相关主题

摘要

BackgroundExperimentally verified protein-protein interactions (PPI) cannot be easily retrieved by researchers unless they are stored in PPI databases. The curation of such databases can be made faster by ranking newly-published articles' relevance to PPI, a task which we approach here by designing a machine-learning-based PPI classifier. All classifiers require labeled data, and the more labeled data available, the more reliable they become. Although many PPI databases with large numbers of labeled articles are available, incorporating these databases into the base training data may actually reduce classification performance since the supplementary databases may not annotate exactly the same PPI types as the base training data. Our first goal in this paper is to find a method of selecting likely positive data from such supplementary databases. Only extracting likely positive data, however, will bias the classification model unless sufficient negative data is also added. Unfortunately, negative data is very hard to obtain because there are no resources that compile such information. Therefore, our second aim is to select such negative data from unlabeled PubMed data. Thirdly, we explore how to exploit these likely positive and negative data. And lastly, we look at the somewhat unrelated question of which term-weighting scheme is most effective for identifying PPI-related articles.

机译：背景技术经过实验验证的蛋白质-蛋白质相互作用（PPI）很难被研究人员检索，除非它们存储在PPI数据库中。通过对新发表的文章与PPI的相关性进行排名，可以更快地管理此类数据库，我们在此通过设计基于机器学习的PPI分类器来解决这一任务。所有分类器都需要标记数据，并且可用的标记数据越多，它们就越可靠。尽管可以使用许多带有大量标记商品的PPI数据库，但是将这些数据库合并到基础训练数据中实际上可能会降低分类性能，因为补充数据库可能未注释与基础训练数据完全相同的PPI类型。本文的首要目标是找到一种从此类补充数据库中选择可能的阳性数据的方法。但是，仅提取可能的正数据会偏向分类模型，除非还添加了足够的负数据。不幸的是，很难获得负面数据，因为没有资源可以汇编此类信息。因此，我们的第二个目标是从未标记的PubMed数据中选择此类阴性数据。第三，我们探索如何利用这些可能的正面和负面数据。最后，我们来看一个无关紧要的问题，即哪种术语加权方案最有效地识别与PPI相关的文章。

著录项

期刊名称 BMC Bioinformatics
作者
Richard Tzong-Han Tsai; Hsi-Chuan Hung; Hong-Jie Dai; Yi-Wen Lin; Wen-Lian Hsu;
展开▼
作者单位

展开▼
年(卷),期 2008(9),Suppl 1
年度 2008
页码 S3
总页数 10
原文格式 PDF
正文语种
中图分类应用微生物学;生化遗传学;生化药理学;
关键词

相似文献

外文文献
中文文献
专利

1. Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles [J] . Richard Tzong-Han Tsai, Hsi-Chuan Hung, Hong-Jie Dai, BMC Bioinformatics . 2008,第SUPPLEMENTa1期

机译：利用可能阳性和未标记的数据来改善蛋白质相互作用的鉴定
2. Learning an enriched representation from unlabeled data for protein-protein interaction extraction [J] . Yanpeng Li, Xiaohua Hu, Hongfei Lin, BMC Bioinformatics . 2010,第SUPPLEMENTa2期

机译：从未标记的数据中学习丰富的表示形式以进行蛋白质-蛋白质相互作用提取
3. Exploiting unlabeled data to improve peer-to-peer traffic classification using incremental tri-training method [J] . Bijan Raahemi, Weicai Zhong, Jing Liu Peer-to-peer networking and applications . 2009,第2期

机译：利用增量三训练方法开发未标记数据以改善对等流量分类
4. Improved label noise identification by exploiting unlabeled data [C] . Hongqiang Wei, Qi Zhu, Donghai Guan, 2017 International Conference on Security, Pattern Analysis, and Cybernetics . 2017

机译：通过利用未标记的数据来改善标签噪声识别
5. Exploitation of unlabeled data and related tasks in semi-supervised learning. [D] . Liu, Qiuhua. 2007

机译：在半监督学习中利用未标记的数据和相关任务。
6. Learning an enriched representation from unlabeled data for protein-protein interaction extraction [O] . Yanpeng Li, Xiaohua Hu, Hongfei Lin, 2010

机译：从未标记的数据中学习丰富的表示形式以进行蛋白质-蛋白质相互作用提取
7. Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles [O] . Richard Tsai, Hsi-Chuan Hung, Hong-Jie Dai, 2008

机译：利用可能阳性和未标记的数据来改进蛋白质相互作用的鉴定
8. Techniques for Exploiting Unlabeled Data [R] . Rwebangira, M. R. 2008

机译：利用未标记数据的技术

Exploiting likely-positive and unlabeled data to improve the identification of protein-protein interaction articles

摘要

著录项

相似文献

相关主题

期刊订阅