An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition

Klesti Hoxha; Artur Baxhaku

首页> 外文期刊>Cybernetics and information technologies: CIT >An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition

【24h】

An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition

机译：用于阿尔巴尼亚命名实体识别的自动生成的带注释语料库

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Named Entity Recognition (NER) is an important task in many NLPpipelines. It has become especially important for knowledge bases that power manyof the nowadays information retrieval systems. In order to cope with the highdemand for annotated training corpora for supervised NER systems, automaticgeneration approaches have been proposed. In this paper we report on the firstautomatically generated NE annotated corpus for Albanian. News articles fromAlbanian news media were used as a document source. They were automaticallytagged using a custom generated gazetteer from the Albanian Wikipedia. Ourevaluation results show that this corpus can be used as a baseline corpus for humanannotated ones or as a training corpus where no other is available.

机译：在许多NLP管道中，命名实体识别（NER）是一项重要任务。对于支持当今许多信息检索系统的知识库而言，这尤其重要。为了应对对有监督的NER系统的带注释训练语料库的需求，提出了自动生成方法。在本文中，我们报告了阿尔巴尼亚语第一个自动生成的带有NE注释的语料库。来自阿尔巴尼亚新闻媒体的新闻文章被用作文档来源。使用来自阿尔巴尼亚语Wikipedia的定制生成的地名词典自动标记了它们。我们的评估结果表明，该语料库可以用作带人注释的语料库的基准语料库，也可以用作没有其他可用语料库的训练语料库。

著录项

来源
《Cybernetics and information technologies: CIT》 |2017年第1期|共14页
作者
Klesti Hoxha; Artur Baxhaku;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类自动信息理论;
关键词

相似文献

外文文献
中文文献
专利

1. Development of a Hindi Named Entity Recognition System without Using Manually Annotated Training Corpus [J] . Saha Sujan Kumar, Majumder Mukta The international arab journal of information technology . 2018,第6期

机译：不使用人工注释的训练语料库的印地语命名实体识别系统的开发
2. Assessment of disease named entity recognition on a corpus of annotated sentences [J] . Antonio Jimeno, Ernesto Jimenez-Ruiz, Vivian Lee, BMC Bioinformatics . 2008,第SUPPLEMENTa3期

机译：在带注释句子的语料库上评估疾病命名实体识别
3. Myanmar named entity corpus and its use in syllable-based neural named entity recognition [J] . Hsu Myat Mo, Khin Mar Soe International Journal of Electrical and Computer Engineering . 2020,第2期

机译：缅甸名为实体语料库及其在基于音节的神经名为实体识别中的用途
4. Named Entity Recognition for Icelandic: Annotated Corpus and Models [C] . Svanhvft L. Ingolfsdottir, Asmundur A. Guðjonsson, Hrafn Loftsson International Conference on Statistical Language and Speech Processing . 2020

机译：为冰岛命名的实体识别：注释语料库和模型
5. Arabic Named Entity Recognition: A Corpus-Based Study [D] . Algahtani, Shabib. 2012

机译：阿拉伯语命名实体识别：基于语料库的研究
6. Assessment of disease named entity recognition on a corpus of annotated sentences [O] . Antonio Jimeno, Ernesto Jimenez-Ruiz, Vivian Lee, 2008

机译：在带注释句子的语料库上评估疾病命名实体识别
7. Automatic Creation of Arabic Named Entity Annotated Corpus Using Wikipedia. [O] . Althobaiti Maha, Kruschwitz Udo, Poesio Massimo 2014

机译：使用Wikipedia自动创建带阿拉伯文名称的带注释的语料库。

An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition

摘要

著录项

相似文献

相关主题

期刊订阅