...
首页> 外文期刊>Information retrieval >Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents
【24h】

Towards enhancing retrieval effectiveness of search engines for diacritisized Arabic documents

机译:致力于提高搜索引擎对阿拉伯化的阿拉伯文档的检索效率

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

The majority of Arabic text available on the web is written without short vowels (diacritics). Diacritics are commonly used in religious scripts such as the holy Quran (the book of Islam), Al-Hadith (the teachings of Prophet Mohammad (PBUH)), children's literature, and in some words where ambiguity of articulation might arise. Internet Arabic users might lose credible sources of Arabic text to be retrieved if they could not match the correct diacritical marks attached to the words in the collection. However, typing the diacritical marks is very annoying and time consuming. The other way around, is to ignore these marks and fall into the problem of ambiguity. Previous work suggested pre-processing of Arabic text to remove these diacritical marks before indexing. Consequently, there are noticeable discrepancies when searching the web for Arabic text using international search engines such as Google and yahoo. In this article, we propose a framework to enhance the retrieval effectiveness of search engines to search for diacritic and diacritic-less Arabic text through query expansion techniques. We used a rule-based stemmer and a semantic relational database compiled in an experimental thesaurus to do the expansion. We tested our approach on the scripts of the Quran. We found that query expansion for searching Arabic text is promising and it is likely that the efficiency can be further improved by advanced natural language processing tools.
机译:网络上可用的大多数阿拉伯语文字都没有短元音(变音符号)。变音符号通常用于宗教文字中,例如神圣的古兰经(伊斯兰教),哈迪思(先知穆罕默德(PBUH)的教义),儿童文学,以及在某些情况下可能会产生歧义。如果阿拉伯语互联网用户无法与收藏中的单词所附加的正确变音符号匹配,则可能会失去可靠的阿拉伯语来源,以致于无法检索。但是,键入变音标记非常烦人且耗时。另一种方法是忽略这些标记并陷入歧义问题。先前的工作建议对阿拉伯文本进行预处理,以在索引之前删除这些变音标记。因此,使用国际搜索引擎(例如Google和yahoo)在网络上搜索阿拉伯文本时,会出现明显的差异。在本文中,我们提出了一个框架,以通过搜索扩展技术来增强搜索引擎的检索效率,以搜索变音符号和无变音符号的阿拉伯文本。我们使用基于规则的词干分析器和在实验词库中编译的语义关系数据库来进行扩展。我们在《古兰经》的脚本上测试了我们的方法。我们发现查询扩展用于搜索阿拉伯文本很有希望,并且可以通过先进的自然语言处理工具进一步提高效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号