首页> 外国专利> System and method for disambiguating non diacritized arabic words in a text

System and method for disambiguating non diacritized arabic words in a text

机译:用于消除文本中未歧义的阿拉伯语单词的歧义的系统和方法

摘要

The present invention proposes a solution to the problem of word lexical disambiguation in Arabic texts. This solution is based on text domain-specific knowledge, which facilitates the automatic vowel restoration of modern standard Arabic scripts. Texts similar in their contents, restricted to a specific field or sharing a common knowledge can be grouped in a specific category or in a specific domain (examples of specific domains : sport, art, economic, science ...). The present invention discloses a method, system and computer program for lexically disambiguating non diacritized Arabic words in a text based on a learning approach that exploits : Arabic lexical look-up, and Arabic morphological analysis, to train the system on a corpus of diacritized Arabic text pertaining to a specific domain. Thereby, the contextual relationships of the words related to a specific domain are identified, based on the valid assumption that there is less lexical variability in the use of the words and their morphological variants within a domain compared to an unrestricted text.
机译:本发明提出了一种解决阿拉伯文本中单词词汇歧义化问题的解决方案。此解决方案基于特定于文本域的知识,这有助于现代标准阿拉伯语脚本的自动元音恢复。内容相似,限于特定领域或具有共同知识的文本可以分为特定类别或特定领域(特定领域的示例:体育,艺术,经济,科学等)。本发明公开了一种方法,系统和计算机程序,该方法,系统和计算机程序用于基于以下学习方法对文本中未歧义的阿拉伯单词进行词汇歧义消除:学习阿拉伯语词汇查询和阿拉伯语形态分析,在与特定领域有关的全阿拉伯数字文本上训练系统。由此,基于与无限制文本相比在域内使用单词及其形态变体的词汇变化少的有效假设,来识别与特定领域相关的单词的上下文关系。

著录项

  • 公开/公告号EP1675019B1

    专利类型

  • 公开/公告日2007-08-01

    原文格式PDF

  • 申请/专利权人 INTERNATIONAL BUSINESS MACHINES CORPORATION;

    申请/专利号EP20050110694

  • 发明设计人 EL-SHISHINY HISHAM;

    申请日2005-11-14

  • 分类号G06F17/27;

  • 国家 EP

  • 入库时间 2022-08-21 20:48:13

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号