首页> 外文会议>Language technology resources and tools for digital humanities >Challenges and Solutions for Latin Named Entity Recognition
【24h】

Challenges and Solutions for Latin Named Entity Recognition

机译:拉丁命名实体识别的挑战和解决方案

获取原文
获取原文并翻译 | 示例

摘要

Although spanning thousands of years and genres as diverse as liturgy, historiography, lyric and other forms of prose and poetry, the body of Latin texts is still relatively sparse compared to English. Data sparsity in Latin presents a number of challenges for traditional Named Entity Recognition techniques. Solving such challenges and enabling reliable Named Entity Recognition in Latin texts can facilitate many down-stream applications, from machine translation to digital historiography, enabling Classicists, historians, and archaeologists for instance, to track the relationships of historical persons, places, and groups on a large scale. This paper presents the first annotated corpus for evaluating Named Entity Recognition in Latin, as well as a fully supervised model that achieves over 90% F-score on a held-out test set, significantly outperforming a competitive baseline. We also present a novel active learning strategy that predicts how many and which sentences need to be annotated for named entities in order to attain a specified degree of accuracy when recognizing named entities automatically in a given text. This maximizes the productivity of annotators while simultaneously controlling quality.
机译:尽管跨越了数千年的历史,各种礼仪,史学,抒情诗和其他形式的散文和诗歌都流于形式,但与英语相比,拉丁文本的内容仍然相对稀疏。拉丁语中的数据稀疏性对传统的命名实体识别技术提出了许多挑战。解决此类挑战并在拉丁语文本中实现可靠的命名实体识别,可以促进从机器翻译到数字史学的许多下游应用,例如,使古典主义者,历史学家和考古学家能够追踪历史人物,地点和群体之间的关系。大规模。本文介绍了第一个带注释的语料库,用于评估拉丁语中的命名实体识别,以及一个完全受监督的模型,该模型在保留的测试集上可达到90%的F分数,明显优于竞争基准。我们还提出了一种新颖的主动学习策略,该策略可以预测在给定文本中自动识别命名实体时,为命名实体需要注释多少个句子以及哪些句子才能达到指定的准确性。这样可以在控制质量的同时最大化注释器的生产率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号