...
首页> 外文期刊>BMC Bioinformatics >Weakly supervised learning of biomedical information extraction from curated data
【24h】

Weakly supervised learning of biomedical information extraction from curated data

机译:从策划数据中弱监督学习生物医学信息提取

获取原文
   

获取外文期刊封面封底 >>

       

摘要

Numerous publicly available biomedical databases derive data by curating from literatures. The curated data can be useful as training examples for information extraction, but curated data usually lack the exact mentions and their locations in the text required for supervised machine learning. This paper describes a general approach to information extraction using curated data as training examples. The idea is to formulate the problem as cost-sensitive learning from noisy labels, where the cost is estimated by a committee of weak classifiers that consider both curated data and the text. We test the idea on two information extraction tasks of Genome-Wide Association Studies (GWAS). The first task is to extract target phenotypes (diseases or traits) of a study and the second is to extract ethnicity backgrounds of study subjects for different stages (initial or replication). Experimental results show that our approach can achieve 87 % of Precision-at-2 (P@2) for disease/trait extraction, and 0.83 of F1-Score for stage-ethnicity extraction, both outperforming their cost-insensitive baseline counterparts. The results show that curated biomedical databases can potentially be reused as training examples to train information extractors without expert annotation or refinement, opening an unprecedented opportunity of using “big data” in biomedical text mining.
机译:许多可公开获得的生物医学数据库都是通过从文献中挑选来获得数据的。精选的数据可以用作信息提取的训练示例,但是精选的数据通常在监督型机器学习所需的文本中缺少确切的提及及其位置。本文介绍了一种使用策展数据作为训练示例的信息提取一般方法。想法是将问题表述为从嘈杂标签中进行成本敏感的学习,其中成本由同时考虑整理数据和文本的弱分类委员会估算。我们在全基因组关联研究(GWAS)的两个信息提取任务上测试了该思想。第一个任务是提取研究的目标表型(疾病或特征),第二个任务是提取不同阶段(初始或复制)的研究对象的种族背景。实验结果表明,对于疾病/特征提取,我们的方法可以达到87%的Precision-at-2(P @ 2),对于阶段种族提取,可以达到0.83的F1-Score,两者均优于对成本不敏感的基准。结果表明,精选的生物医学数据库可以潜在地用作培训示例,以训练信息提取者,而无需专家注释或完善,从而为在生物医学文本挖掘中使用“大数据”开辟了前所未有的机会。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号