首页>
外国专利>
System and method for disambiguating non diacritized arabic words in a text
System and method for disambiguating non diacritized arabic words in a text
展开▼
机译:用于消除文本中未歧义的阿拉伯语单词的歧义的系统和方法
展开▼
页面导航
摘要
著录项
相似文献
摘要
The present invention proposes a solution to the problem of word lexical disambiguation in Arabic texts. This solution is based on text domain-specific knowledge, which facilitates the automatic vowel restoration of modern standard Arabic scripts. Texts similar in their contents, restricted to a specific field or sharing a common knowledge can be grouped in a specific category or in a specific domain (examples of specific domains : sport, art, economic, science ...). The present invention discloses a method, system and computer program for lexically disambiguating non diacritized Arabic words in a text based on a learning approach that exploits : Arabic lexical look-up, and Arabic morphological analysis, to train the system on a corpus of diacritized Arabic text pertaining to a specific domain. Thereby, the contextual relationships of the words related to a specific domain are identified, based on the valid assumption that there is less lexical variability in the use of the words and their morphological variants within a domain compared to an unrestricted text.
展开▼