首页> 外文期刊>Cybernetics and information technologies: CIT >An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition
【24h】

An Automatically Generated Annotated Corpus for Albanian Named Entity Recognition

机译:用于阿尔巴尼亚人命名实体识别的自动生成的注释语料库

获取原文
           

摘要

Named Entity Recognition (NER) is an important task in many NLPpipelines. It has become especially important for knowledge bases that power manyof the nowadays information retrieval systems. In order to cope with the highdemand for annotated training corpora for supervised NER systems, automaticgeneration approaches have been proposed. In this paper we report on the firstautomatically generated NE annotated corpus for Albanian. News articles fromAlbanian news media were used as a document source. They were automaticallytagged using a custom generated gazetteer from the Albanian Wikipedia. Ourevaluation results show that this corpus can be used as a baseline corpus for humanannotated ones or as a training corpus where no other is available.
机译:命名实体识别(ner)是许多NLPPipelines中的重要任务。 对于当今信息检索系统的许多信息来看,它对知识库尤为重要。 为了应对监督NER系统的注释培训Corpora的HighdedMand,已经提出了自动化方法。 在本文中,我们向阿尔巴尼亚人报告了Firstautomay生成的NE注释语料库。 新闻文章从哈尔巴尼亚新闻媒体用作文件来源。 他们是使用来自阿尔巴尼亚维基百科的自定义生殖的瞪羚自动标记。 OureSuituation结果表明,该语料库可用作人类annotated毒性的基线语料库或作为培训语料库,在没有其他可用的地方。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号