Challenges and Solutions for Latin Named Entity Recognition

机译：拉丁名称实体识别的挑战与解决方案

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Although spanning thousands of years and genres as diverse as liturgy, historiography, lyric and other forms of prose and poetry, the body of Latin texts is still relatively sparse compared to English. Data sparsity in Latin presents a number of challenges for traditional Named Entity Recognition techniques. Solving such challenges and enabling reliable Named Entity Recognition in Latin texts can facilitate many down-stream applications, from machine translation to digital historiography, enabling Classicists, historians, and archaeologists for instance, to track the relationships of historical persons, places, and groups on a large scale. This paper presents the first annotated corpus for evaluating Named Entity Recognition in Latin, as well as a fully supervised model that achieves over 90% F-score on a held-out test set, significantly outperforming a competitive baseline. We also present a novel active learning strategy that predicts how many and which sentences need to be annotated for named entities in order to attain a specified degree of accuracy when recognizing named entities automatically in a given text. This maximizes the productivity of annotators while simultaneously controlling quality.

机译：虽然跨越数千年和流派，但历史，历史，抒情和其他形式的散文和诗歌，但与英语相比，拉丁文的身体仍然相对稀疏。拉丁语的数据稀疏性对于传统的命名实体识别技术提供了许多挑战。解决这些挑战和支持拉丁文本中可靠的命名实体识别可以促进许多下游应用程序，从机器翻译到数字史造影，使典型主义者，历史学家和考古学家例如追踪历史人员，地方和群体的关系大规模。本文介绍了第一个评估拉丁语的命名实体识别的第一个注释语料库，以及一个完全监督的模型，在保持测试集中实现了超过90％的F分，显着优于竞争性基线。我们还提出了一种新的主动学习策略，该策略预测了用于命名实体的命名有多少以及哪些句子，以便在给定文本中自动识别命名实体时获得指定的准确性。这使得注释器的生产率最大限度地控制质量。

著录项

来源
《Workshop on language technology resources and tools for digital humanities》|2016年|xii 195 p.|共9页
会议地点
作者
Alexander Erdmann; Christopher Brown; Brian Joseph; Mark Janse; Petra Ajaka; Micha Eisner; Marie-Catherine de Marneffe;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Myanmar named entity corpus and its use in syllable-based neural named entity recognition [J] . Hsu Myat Mo, Khin Mar Soe International Journal of Electrical and Computer Engineering . 2020,第2期

机译：缅甸名为实体语料库及其在基于音节的神经名为实体识别中的用途
2. Lessons learnt from the Named Entity rEcognition and Linking (NEEL) challenge series [J] . Rizzo Giuseppe, Pereira Bianca, Varga Andrea, Semantic web . 2017,第5期

机译：从命名实体识别和链接（Neel）挑战＆NBSP的经验教训;系列
3. Challenges of Urdu Named Entity Recognition: A Scarce Resourced Language [J] . Saeeda Naz, Arif Iqbal Umar, Syed Hamad Shirazi, Research journal of applied science, engineering and technology . 2014,第10期

机译：乌尔都语命名实体识别的挑战：一种稀缺的资源语言
4. Challenges and Solutions for Latin Named Entity Recognition [C] . Alexander Erdmann, Christopher Brown, Brian Joseph, Language technology resources and tools for digital humanities . 2016

机译：拉丁命名实体识别的挑战和解决方案
5. Semi-supervised Named Entity Recognition: Learning to recognize 100 entity types with little supervision [D] . Nadeau, David. 2007

机译：半监督的命名实体识别：在很少的监督下学习识别100种实体类型
6. Precursor-induced conditional random fields: connecting separate entities by induction for improved clinical named entity recognition [O] . Wangjin Lee, Jinwook Choi 2019

机译：前体诱导的条件随机场：通过诱导连接单独的实体以改善临床命名实体的识别
7. Named Entity Recognition and Named Entity Linking on Esports Contents [O] . Ziyu Liu, Yifan Leng, Meiqi Wang, 2020

机译：命名实体识别和命名实体链接在esports内容上

Challenges and Solutions for Latin Named Entity Recognition

摘要

著录项

相似文献

相关主题

期刊订阅