首页> 外文期刊>Journal of biomedical informatics. >Detecting hedge cues and their scope in biomedical text with conditional random fields.
【24h】

Detecting hedge cues and their scope in biomedical text with conditional random fields.

机译:使用条件随机字段检测生物医学文本中的树篱线索及其范围。

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

OBJECTIVE: Hedging is frequently used in both the biological literature and clinical notes to denote uncertainty or speculation. It is important for text-mining applications to detect hedge cues and their scope; otherwise, uncertain events are incorrectly identified as factual events. However, due to the complexity of language, identifying hedge cues and their scope in a sentence is not a trivial task. Our objective was to develop an algorithm that would automatically detect hedge cues and their scope in biomedical literature. METHODOLOGY: We used conditional random fields (CRFs), a supervised machine-learning algorithm, to train models to detect hedge cue phrases and their scope in biomedical literature. The models were trained on the publicly available BioScope corpus. We evaluated the performance of the CRF models in identifying hedge cue phrases and their scope by calculating recall, precision and F1-score. We compared our models with three competitive baseline systems. RESULTS: Our best CRF-based model performed statistically better than the baseline systems, achieving an F1-score of 88% and 86% in detecting hedge cue phrases and their scope in biological literature and an F1-score of 93% and 90% in detecting hedge cue phrases and their scope in clinical notes. CONCLUSIONS: Our approach is robust, as it can identify hedge cues and their scope in both biological and clinical text. To benefit text-mining applications, our system is publicly available as a Java API and as an online application at http://hedgescope.askhermes.org. To our knowledge, this is the first publicly available system to detect hedge cues and their scope in biomedical literature.
机译:目的:对冲在生物学文献和临床笔记中经常被用来表示不确定性或推测。对于文本挖掘应用程序而言,检测树篱线索及其范围非常重要。否则,不确定事件被错误地识别为事实事件。但是,由于语言的复杂性,识别树篱提示及其在句子中的作用范围并不是一件容易的事。我们的目标是开发一种算法,该算法可自动检测生物医学文献中的对冲线索及其范围。方法:我们使用条件随机场(CRF)(一种监督的机器学习算法)来训练模型,以检测对冲提示短语及其在生物医学文献中的范围。这些模型在公开的BioScope语料库上进行了训练。我们通过计算召回率,精确度和F1分数评估了CRF模型在识别对冲提示短语及其范围方面的性能。我们将模型与三个竞争基准系统进行了比较。结果:我们最好的基于CRF的模型在统计上比基准系统更好,在检测对冲提示短语及其在生物学文献中的范围时,F1-分数达到88%和86%,而在生物学文献中,F1-分数达到93%和90%。检测树篱提示短语及其在临床笔记中的范围。结论:我们的方法是健壮的,因为它可以在生物学和临床文本中识别树篱线索及其范围。为了使文本挖掘应用程序受益,我们的系统可作为Java API和在线应用程序在http://hedgescope.askhermes.org上公开获得。据我们所知,这是第一个公开检测对冲线索及其在生物医学文献中的范围的系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号