...
首页> 外文期刊>BMC Bioinformatics >Recognizing speculative language in biomedical research articles: a linguistically motivated perspective
【24h】

Recognizing speculative language in biomedical research articles: a linguistically motivated perspective

机译:识别生物医学研究文章中的推测性语言:出于语言动机的观点

获取原文
           

摘要

Background Due to the nature of scientific methodology, research articles are rich in speculative and tentative statements, also known as hedges. We explore a linguistically motivated approach to the problem of recognizing such language in biomedical research articles. Our approach draws on prior linguistic work as well as existing lexical resources to create a dictionary of hedging cues and extends it by introducing syntactic patterns. Furthermore, recognizing that hedging cues differ in speculative strength, we assign them weights in two ways: automatically using the information gain (IG) measure and semi-automatically based on their types and centrality to hedging. Weights of hedging cues are used to determine the speculative strength of sentences. Results We test our system on two publicly available hedging datasets. On the fruit-fly dataset, we achieve a precision-recall breakeven point (BEP) of 0.85 using the semi-automatic weighting scheme and a lower BEP of 0.80 with the information gain weighting scheme. These results are competitive with the previously reported best results (BEP of 0.85). On the BMC dataset, using semi-automatic weighting yields a BEP of 0.82, a statistically significant improvement (p Conclusion Our results demonstrate that speculative language can be recognized successfully with a linguistically motivated approach and confirms that selection of hedging devices affects the speculative strength of the sentence, which can be captured reasonably by weighting the hedging cues. The improvement obtained on the BMC dataset with a semi-automatic weighting scheme indicates that our linguistically oriented approach is more portable than the machine-learning based approaches. Lower performance obtained with the information gain weighting scheme suggests that this method may benefit from a larger, manually annotated corpus for automatically inducing the weights.
机译:背景技术由于科学方法论的性质,研究文章中充满了投机性和暂定性陈述,也被称为对冲。我们探索一种语言动机的方法来解决生物医学研究文章中这种语言的识别问题。我们的方法利用先前的语言工作以及现有的词汇资源来创建对冲提示字典,并通过引入句法模式对其进行扩展。此外,认识到套期线索在投机强度上有所不同,我们通过两种方式为它们分配权重:自动使用信息增益(IG)度量,以及基于它们的类型和套期保值的中心性自动。避险线索的权重用于确定句子的推测强度。结果我们在两个公开可用的对冲数据集中测试了我们的系统。在果蝇数据集上,我们使用半自动加权方案实现了0.85的精确召回收支平衡点(BEP),使用信息增益加权方案实现了0.80的较低BEP。这些结果与以前报告的最佳结果(BEP为0.85)相比具有竞争力。在BMC数据集上,使用半自动加权得出的BEP为0.82,在统计上有显着改善(p结论我们的结果表明,使用语言动机的方法可以成功识别投机语言,并确认对冲设备的选择会影响通过对套期线索进行加权,可以合理地捕获该句子。使用半自动加权方案对BMC数据集进行的改进表明,我们的面向语言的方法比基于机器学习的方法更具可移植性。信息增益加权方案表明,此方法可能会受益于较大的,手动注释的语料库以自动得出权重。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号