...
首页> 外文期刊>PLoS Computational Biology >Imitating Manual Curation of Text-Mined Facts in Biomedicine
【24h】

Imitating Manual Curation of Text-Mined Facts in Biomedicine

机译:模仿生物医学中的文本事实的手动处理

获取原文
           

摘要

Text-mining algorithms make mistakes in extracting facts from natural-language texts. In biomedical applications, which rely on use of text-mined data, it is critical to assess the quality (the probability that the message is correctly extracted) of individual facts—to resolve data conflicts and inconsistencies. Using a large set of almost 100,000 manually produced evaluations (most facts were independently reviewed more than once, producing independent evaluations), we implemented and tested a collection of algorithms that mimic human evaluation of facts provided by an automated information-extraction system. The performance of our best automated classifiers closely approached that of our human evaluators (ROC score close to 0.95). Our hypothesis is that, were we to use a larger number of human experts to evaluate any given sentence, we could implement an artificial-intelligence curator that would perform the classification job at least as accurately as an average individual human evaluator. We illustrated our analysis by visualizing the predicted accuracy of the text-mined relations involving the term cocaine.
机译:文本挖掘算法在从自然语言文本中提取事实时出错。在依靠使用文本挖掘的数据的生物医学应用中,至关重要的是评估单个事实的质量(正确提取消息的概率),以解决数据冲突和矛盾。我们使用了将近100,000个手动进行的评估的大集合(大多数事实被多次独立审查,产生了独立的评估),我们实施并测试了一组算法,这些算法模仿了由自动信息提取系统提供的人类对事实的评估。我们最好的自动分类器的性能非常接近我们的人工评估器(ROC得分接近0.95)。我们的假设是,如果我们使用大量的人类专家来评估任何给定的句子,那么我们可以实现一个人工智能策展人,其执行分类工作的能力至少与普通个人评估员一样准确。我们通过可视化涉及术语可卡因的文本挖掘关系的预测准确性来说明我们的分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号