首页> 外文会议> >Combined Classification for Extracting Named Entities from Arabic Texts
【24h】

Combined Classification for Extracting Named Entities from Arabic Texts

机译:从阿拉伯语文本中提取命名实体的组合分类

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

In this paper, we describe an approach for extracting named entities from Arabic texts. Arabic language is hard to process since its characteristics that influence, even, the NE extraction. For our case, we consider that the named entities extraction can be assimilated to a typical classification problem. Indeed, this extraction consists of searching for text portions that can be classified in a NE class (Person, Locality or Organization). Thus, we choose to use a supervised learning approach and employ the BIO tagging format that can solve the twin problems of segmentation and categorization. In addition, singular classifier cannot give good results for all types of contexts. Thus, we adopt a set of weighted classifiers which we combined through a voting procedure. In order to appreciate properly the performance of our system, we perform two types of tests: with and without morphological attributes. We consider that the results are highly satisfactory especially with a accuracy that exceeds 89% for both Person and Locality classes.
机译:在本文中,我们描述了一种从阿拉伯文本中提取命名实体的方法。阿拉伯语言很难处理,因为其特征甚至会影响NE提取。对于我们的情况,我们认为命名实体提取可以与典型的分类问题相提并论。实际上,此提取包括搜索可以归类为NE类(人员,位置或组织)的文本部分。因此,我们选择使用监督学习方法并采用BIO标记格式,该格式可以解决分割和分类的双重问题。另外,奇异分类器不能为所有类型的上下文提供良好的结果。因此,我们采用了一组加权分类器,这些分类器是通过表决程序组合而成的。为了正确地了解我们系统的性能,我们执行两种类型的测试:有和没有形态属性。我们认为结果非常令人满意,特别是对于“人物”和“地方”类的准确性都超过89%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号