首页> 外文会议>Workshop on Arabic natural language processing >EDRAK: Entity-Centric Data Resource for Arabic Knowledge
【24h】

EDRAK: Entity-Centric Data Resource for Arabic Knowledge

机译:Edrak:用于阿拉伯知识的实体/地区数据资源

获取原文

摘要

Online Arabic content is growing very rapidly, with unmatched growth in Arabic structured resources. Systems that perform standard Natural Language Processing (NLP) tasks such as Named Entity Disambiguation (NED) struggle to deliver decent quality due to the lack of rich Arabic entity repositories. In this paper, we introduce EDRAK, an automatically generated comprehensive Arabic entity-centric resource. EDRAK contains more than two million entities together with their Arabic names and contextual keyphrases. Manual evaluation confirmed the quality of the generated data. We are making EDRAK publicly available as a valuable resource to help advance research in Arabic NLP and IR tasks such as dictionary-based Named-Entity Recognition, entity classification, and entity summarization.
机译:在线阿拉伯语内容正在非常迅速增长,具有无与伦比的结构化资源的增长。执行标准自然语言处理(NLP)任务的系统,例如命名实体消歧(NED)斗争,以提供由于缺乏富含阿拉伯实体存储库而产生的体质质量。在本文中,我们介绍了Edrak,这是一个自动生成的综合阿拉伯实体为中心的资源。 Edrak包含超过200万个实体,以及他们的阿拉伯名字和上下文关键词。手动评估证实了所生成的数据的质量。我们正在将Edrak公开可用作有价值的资源,以帮助提前参加阿拉伯语NLP和IR任务的研究,例如基于字典的名称实体识别,实体分类和实体摘要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号