首页> 外文会议>2015 5th International Conference on Information amp; Communication Technology and Accessibility >Evaluation of the ambiguity caused by the absence of diacritical marks in Arabic texts: Statistical study
【24h】

Evaluation of the ambiguity caused by the absence of diacritical marks in Arabic texts: Statistical study

机译:阿拉伯语文本中不存在变音标记引起的歧义评估:统计研究

获取原文
获取原文并翻译 | 示例

摘要

This work falls within the framework of the Natural Language Processing. Its objective is to assess the level of ambiguity caused by the absence of diacritical marks in Arabic texts during the information extraction process. We have carried out a statistical study based on four indicators: the root, the lemma, the stem and the POS tag of the word. For this, we used a large vowelized corpus containing more than 80 million words collected from several sources. The conducted study showed that the absence of diacritical marks in Arabic texts represents the main cause of the ambiguity observed in the information extraction process. Thus, based on this study we can conclude that the use of a vowelized corpus reduces considerably the ambiguity.
机译:这项工作属于自然语言处理的框架。其目的是评估在信息提取过程中阿拉伯文本中不存在变音标记所引起的歧义程度。我们基于四个指标进行了统计研究:单词的词根,引理,词干和POS标签。为此,我们使用了一个大型元音语料库,其中包含从多个来源收集的超过8000万个单词。进行的研究表明,阿拉伯语文本中不存在变音符号是信息提取过程中出现歧义的主要原因。因此,根据这项研究,我们可以得出结论,使用元音语料库可以大大减少歧义。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号