首页> 外文会议>Advances in Natural Language Processing >Arabic Named Entity Recognition from Diverse Text Types
【24h】

Arabic Named Entity Recognition from Diverse Text Types

机译:多种文本类型的阿拉伯命名实体识别

获取原文
获取原文并翻译 | 示例

摘要

Name identification has been worked on quite intensively for the past few years, and has been incorporated into several products. Many researchers have attacked this problem in a variety of languages but only a few limited researches have focused on Named Entity Recognition (NER) for Arabic text due to the lack of resources for Arabic named entities and the limited amount of progress made in Arabic natural language processing in general. In this paper, we present the results of our attempt at the recognition and extraction of 10 most important named entities in Arabic script; the person name, location, company, date, time, price, measurement, phone number, ISBN and file name. We developed the system, Name Entity Recognition for Arabic (NERA), using a rule-based approach. The system consists of a whitelist representing a dictionary of names, and a grammar, in the form of regular expressions, which are responsible for recognizing the named entities. NERA is evaluated using our own corpora that are tagged in a semi-automated way, and the performance results achieved were satisfactory in terms of precision, recall, and f-measure.
机译:在过去的几年中,名称识别已经投入了相当多的精力,并且已经被整合到多种产品中。许多研究人员已经以多种语言攻击了这个问题,但是由于缺乏阿拉伯命名实体的资源并且阿拉伯自然语言的进步有限,只有少数有限的研究集中在阿拉伯文本的命名实体识别(NER)上。一般处理。在本文中,我们介绍了我们尝试识别和提取阿拉伯文字中10个最重要的命名实体的结果;人员名称,位置,公司,日期,时间,价格,尺寸,电话号码,ISBN和文件名。我们使用基于规则的方法开发了系统,即阿拉伯文名称实体识别(NERA)。该系统由代表名称词典的白名单和以正则表达式形式的语法组成,它们负责识别命名实体。使用我们自己的语料库对NERA进行评估,该语料库以半自动化方式标记,并且在精度,召回率和f量度方面,所获得的性能结果令人满意。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号