Weakly supervised learning of biomedical information extraction from curated data

Suvir Jain; Kashyap R.; Tsung-Ting Kuo; Shitij Bhargava; Gordon Lin; Chun-Nan Hsu

首页> 外文期刊>BMC Bioinformatics >Weakly supervised learning of biomedical information extraction from curated data

【24h】

Weakly supervised learning of biomedical information extraction from curated data

机译：从策划数据中弱监督学习生物医学信息提取

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Numerous publicly available biomedical databases derive data by curating from literatures. The curated data can be useful as training examples for information extraction, but curated data usually lack the exact mentions and their locations in the text required for supervised machine learning. This paper describes a general approach to information extraction using curated data as training examples. The idea is to formulate the problem as cost-sensitive learning from noisy labels, where the cost is estimated by a committee of weak classifiers that consider both curated data and the text. We test the idea on two information extraction tasks of Genome-Wide Association Studies (GWAS). The first task is to extract target phenotypes (diseases or traits) of a study and the second is to extract ethnicity backgrounds of study subjects for different stages (initial or replication). Experimental results show that our approach can achieve 87 % of Precision-at-2 (P@2) for disease/trait extraction, and 0.83 of F1-Score for stage-ethnicity extraction, both outperforming their cost-insensitive baseline counterparts. The results show that curated biomedical databases can potentially be reused as training examples to train information extractors without expert annotation or refinement, opening an unprecedented opportunity of using “big data” in biomedical text mining.

机译：许多可公开获得的生物医学数据库都是通过从文献中挑选来获得数据的。精选的数据可以用作信息提取的训练示例，但是精选的数据通常在监督型机器学习所需的文本中缺少确切的提及及其位置。本文介绍了一种使用策展数据作为训练示例的信息提取一般方法。想法是将问题表述为从嘈杂标签中进行成本敏感的学习，其中成本由同时考虑整理数据和文本的弱分类委员会估算。我们在全基因组关联研究（GWAS）的两个信息提取任务上测试了该思想。第一个任务是提取研究的目标表型（疾病或特征），第二个任务是提取不同阶段（初始或复制）的研究对象的种族背景。实验结果表明，对于疾病/特征提取，我们的方法可以达到87％的Precision-at-2（P @ 2），对于阶段种族提取，可以达到0.83的F1-Score，两者均优于对成本不敏感的基准。结果表明，精选的生物医学数据库可以潜在地用作培训示例，以训练信息提取者，而无需专家注释或完善，从而为在生物医学文本挖掘中使用“大数据”开辟了前所未有的机会。

著录项

来源
《BMC Bioinformatics》 |2016年第1期|共页
作者
Suvir Jain; Kashyap R.; Tsung-Ting Kuo; Shitij Bhargava; Gordon Lin; Chun-Nan Hsu;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生物科学;
关键词

相似文献

外文文献
中文文献
专利

1. Mining relational data from text: From strictly supervised to weakly supervised learning [J] . Zhu Zhang Information Systems . 2008,第3期

机译：从文本中挖掘关系数据：从严格监督到弱监督学习
2. Comparing a knowledge-driven approach to a supervised machine learning approach in large-scale extraction of drug-side effect relationships from free-text biomedical literature [J] . Rong Xu, QuanQiu Wang BMC Bioinformatics . 2015,第SUPPLEMENTa5期

机译：从大规模的自由文本生物医学文献中比较知识驱动的方法与有监督的机器学习方法以大规模提取药物副作用的关系
3. A semi-supervised learning framework for biomedical event extraction based on hidden topics [J] . Zhou Deyu, Zhong Dayou Artificial intelligence in medicine . 2015,第1期

机译：基于隐藏主题的生物医学事件提取的半监督学习框架
4. Weakly supervised learning of biomedical information extraction from curated data [C] . Suvir Jain, Kashyap R, Tsung-Ting Kuo, Asia-Pacific Bioinformatics Conference . 2016

机译：从策划数据提取生物医学信息的弱监督学习
5. Learning with Limited Labeled Data in Biomedical Domain by Disentanglement and Semi-Supervised Learning [D] . Gyawali, Prashnna Kumar. 2021

机译：通过解剖学和半监督学习在生物医学领域的有限标记数据学习
6. Weakly supervised learning of biomedical information extraction from curated data [O] . Suvir Jain, Kashyap R., Tsung-Ting Kuo, 2016

机译：从管理数据中弱监督学习生物医学信息提取
7. Erratum to: Weakly supervised learning of biomedical information extraction from curated data [O] . Suvir Jain, Kashyap R. Tumkur, Tsung-Ting Kuo, 2016

机译：勘误到：从监管数据中弱监督学习生物医学信息提取

Weakly supervised learning of biomedical information extraction from curated data

摘要

著录项

相似文献

相关主题

期刊订阅