...
首页> 外文期刊>Cybernetics and information technologies: CIT >An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition
【24h】

An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition

机译:用于阿尔巴尼亚命名实体识别的自动生成的带注释语料库

获取原文
           

摘要

Named Entity Recognition (NER) is an important task in many NLPpipelines. It has become especially important for knowledge bases that power manyof the nowadays information retrieval systems. In order to cope with the highdemand for annotated training corpora for supervised NER systems, automaticgeneration approaches have been proposed. In this paper we report on the firstautomatically generated NE annotated corpus for Albanian. News articles fromAlbanian news media were used as a document source. They were automaticallytagged using a custom generated gazetteer from the Albanian Wikipedia. Ourevaluation results show that this corpus can be used as a baseline corpus for humanannotated ones or as a training corpus where no other is available.
机译:在许多NLP管道中,命名实体识别(NER)是一项重要任务。对于支持当今许多信息检索系统的知识库而言,这尤其重要。为了应对对有监督的NER系统的带注释训练语料库的需求,提出了自动生成方法。在本文中,我们报告了阿尔巴尼亚语第一个自动生成的带有NE注释的语料库。来自阿尔巴尼亚新闻媒体的新闻文章被用作文档来源。使用来自阿尔巴尼亚语Wikipedia的定制生成的地名词典自动标记了它们。我们的评估结果表明,该语料库可以用作带人注释的语料库的基准语料库,也可以用作没有其他可用语料库的训练语料库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号