首页> 外文会议>International conference on theory and practice of digital libraries >Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications
【24h】

Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications

机译:Coner:科学出版物中长尾命名实体识别的协作方法

获取原文

摘要

Named Entity Recognition (NER) for rare long-tail entities as e.g., often found in domain-specific scientific publications is a challenging task, as typically the extensive training data and test data for fine-tuning NER algorithms is lacking. Recent approaches presented promising solutions relying on training NER algorithms in an iterative weakly-supervised fashion, thus limiting human interaction to only providing a small set of seed terms. Such approaches heavily rely on heuristics in order to cope with the limited training data size. As these heuristics are prone to failure, the overall achievable performance is limited. In this paper, we therefore introduce a collaborative approach which incrementally incorporates human feedback on the relevance of extracted entities into the training cycle of such iterative NER algorithms. This approach, called Coner, allows to still train new domain specific rare long-tail NER extractors with low costs, but with ever increasing performance while the algorithm is actively used in an application.
机译:例如,通常在特定领域的科学出版物中经常见到的稀有长尾实体的命名实体识别(NER)是一项艰巨的任务,因为通常缺少用于微调NER算法的大量训练数据和测试数据。最近的方法提出了有希望的解决方案,该解决方案依赖于以迭代的弱监督方式训练NER算法,从而将人机交互限制为仅提供少量种子项。为了应对有限的训练数据量,这种方法严重依赖于启发法。由于这些试探法容易失败,因此可实现的总体性能受到限制。因此,在本文中,我们引入了一种协作方法,该方法将有关提取的实体的相关性的人类反馈逐步纳入这种迭代NER算法的训练周期中。这种称为Coner的方法仍允许以低成本训练新的领域特定的稀有长尾NER提取器,但在算法被积极地应用到应用程序时,性能却不断提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号