...
首页> 外文期刊>Journal of biomedical informatics. >A controlled greedy supervised approach for co-reference resolution on clinical text
【24h】

A controlled greedy supervised approach for co-reference resolution on clinical text

机译:一种针对临床文本的共同参照解决方案的受控贪婪监督方法

获取原文
获取原文并翻译 | 示例

摘要

Identification of co-referent entity mentions inside text has significant importance for other natural language processing (NLP) tasks (e.g. event linking). However, this task, known as co-reference resolution, remains a complex problem, partly because of the confusion over different evaluation metrics and partly because the well-researched existing methodologies do not perform well on new domains such as clinical records. This paper presents a variant of the influential mention-pair model for co-reference resolution. Using a series of linguistically and semantically motivated constraints, the proposed approach controls generation of less-informative/sub-optimal training and test instances. Additionally, the approach also introduces some aggressive greedy strategies in chain clustering. The proposed approach has been tested on the official test corpus of the recently held i2b2/VA 2011 challenge. It achieves an unweighted average F1 score of 0.895, calculated from multiple evaluation metrics (MUC, B3 and CEAF scores). These results are comparable to the best systems of the challenge. What makes our proposed system distinct is that it also achieves high average F1 scores for each individual chain type (Test: 0.897, Person: 0.852, Problem: 0.855, Treatment: 0.884). Unlike other works, it obtains good scores for each of the individual metrics rather than being biased towards a particular metric.
机译:文本内部的共指实体提及的标识对于其他自然语言处理(NLP)任务(例如事件链接)具有重要意义。但是,这项任务(称为共同参考解决方案)仍然是一个复杂的问题,部分原因是对不同评估指标的困惑,部分原因是经过深入研究的现有方法论在新领域(例如临床记录)中表现不佳。本文提出了一种具有影响力的提及对模型的变体,用于共参考解决方案。利用一系列语言和语义动机的约束,所提出的方法控制了信息较少/次优训练和测试实例的生成。此外,该方法还在链式聚类中引入了一些积极的贪婪策略。提议的方法已经在最近举行的i2b2 / VA 2011挑战赛的官方测试语料库中进行了测试。它通过多个评估指标(MUC,B3和CEAF分数)计算得出的F1加权平均得分为0.895。这些结果可与挑战的最佳系统相媲美。使我们提出的系统与众不同的是,它对于每种链条类型也都获得了较高的平均F1分数(测试:0.897,人员:0.852,问题:0.855,治疗:0.884)。与其他作品不同,它为每个单独的指标都获得了良好的分数,而不是偏向某个特定的指标。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号