首页> 外文会议>Workshop on Scholarly Document Processing >Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions
【24h】

Document-Level Definition Detection in Scholarly Documents: Existing Models, Error Analyses, and Future Directions

机译:学术文档中的文档级定义检测:现有模型,错误分析和未来方向

获取原文
获取外文期刊封面目录资料

摘要

The task of definition detection is important for scholarly papers, because papers often make use of technical terminology that may be unfamiliar to readers. Despite prior work on definition detection, current approaches are far from being accurate enough to use in real-world applications. In this paper, we first perform in-depth error analysis of the current best performing definition detection system and discover major causes of errors. Based on this analysis, we develop a new definition detection system, HEDDEx, that utilizes syntactic features, transformer encoders, and heuristic filters, and evaluate it on a standard sentence-level benchmark. Because current benchmarks evaluate randomly sampled sentences, we propose an alternative evaluation that assesses every sentence within a document. This allows for evaluating recall in addition to precision. HEDDEx outperforms the leading system on both the sentence-level and the document-level tasks, by 12.7 F1 points and 14.4 F1 points, respectively. We note that performance on the high-recall document-level task is much lower than in the standard evaluation approach, due to the necessity of incorporation of document structure as features. We discuss remaining challenges in document-level definition detection, ideas for improvements, and potential issues for the development of reading aid applications.
机译:定义检测的任务对于学术论文很重要,因为论文通常会利用可能对读者不熟悉的技术术语。尽管先前的定义检测开始,但目前的方法远非准确,足以用于现实世界应用。在本文中,我们首先对最新的性能定义检测系统进行深入的误差分析,并发现错误的主要原因。在此分析的基础上,我们开发了一个新的定义检测系统,HedDex,它利用句法功能,变压器编码器和启发式过滤器,并在标准句子级基准上进行评估。由于当前基准测试评估随机采样的句子,因此我们提出了一种替代评估,可评估文档中的每个句子。这允许除精度之外还评估召回。 HEDDEX分别优于句子级和文档级任务的领先系统,分别为12.7 F1点和14.4 F1点。我们注意到,由于必须将文档结构纳入特征的必要性,高回忆文档级任务的性能远低于标准评估方法。我们讨论了文档级定义检测,改进思想以及读取援助申请的潜在问题的遗产挑战。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号