首页> 外文学位 >On Arabic search: The effectiveness of monolingual and bidirectional information retrieval.
【24h】

On Arabic search: The effectiveness of monolingual and bidirectional information retrieval.

机译:关于阿拉伯语搜索:单语和双向信息检索的有效性。

获取原文
获取原文并翻译 | 示例

摘要

The foremost goal of this research is to develop algorithms for Arabic monolingual and cross-language Information Retrieval (IR) systems to improve the retrieval effectiveness. The inflectional structure of a word has shown a great impact on the performance of the IR systems, namely, the retrieval precision. We present two stemming algorithms for Arabic IR systems. We empirically investigate the effectiveness of the surface-based retrieval. This approach deteriorates the retrieval precision due to the fact that Arabic is a highly inflected language. Accordingly, we propose the root-based retrieval. We notice a significant improvement over the surface-based approach. Many variant word senses are based on an identical root; thus, the root-based algorithm creates invalid conflation classes that result in an ambiguous query. To resolve ambiguity, we propose a light-stemming algorithm for Arabic texts. We show that the light stemming algorithm significantly outperforms the root-based algorithm. We investigate the effectiveness of using automatic relevance feedback technique in Arabic IR systems. We found that automatic relevance feedback achieves superior retrieval effectiveness in an Arabic IR system.; In Cross-Language Information Retrieval (CLIR), queries in one language retrieve relevant documents in other languages. We investigate the Machine Translation (MT) and the Machine-Readable Dictionaries (MRDs) for bidirectional Arabic-English CLIR. The translation ambiguity associated with these resources is the key problem. We present three methods of query translation using a bilingual dictionary for Arabic-English CLIR. First, we present the Every-Match (EM) method. This method yields ambiguous translations. Thus, we present the First-Match (FM) method that considers the first match in the dictionary as the candidate term. Finally, we present a novel translation model called the Two-Phase (TP). We also empirically evaluate the effectiveness of the Arabic-English MT approach using short, medium, and long queries. English-Arabic CUR is evaluated via an MRD and an English-Arabic MT system. The post-translation expansion technique is used to de-emphasize the extraneous terms introduced by MRD and MT for English-Arabic CLIR.
机译:这项研究的首要目标是为阿拉伯语单语言和跨语言信息检索(IR)系统开发算法,以提高检索效率。单词的拐点结构已对IR系统的性能(即检索精度)产生了很大影响。我们提出了两种针对阿拉伯语IR系统的词干算法。我们根据经验调查基于表面的检索的有效性。由于阿拉伯语是一种高度变形的语言,因此这种方法会降低检索精度。因此,我们提出了基于根的检索。我们注意到与基于表面的方法相比有了很大的改进。许多变体词义都基于相同的词根。因此,基于根的算法会创建无效的合并类,从而导致模棱两可的查询。为了解决歧义,我们提出了一种针对阿拉伯文本的轻梗算法。我们表明,光茎算法明显优于基于根的算法。我们调查在阿拉伯语IR系统中使用自动相关性反馈技术的有效性。我们发现,自动相关性反馈在阿拉伯语IR系统中实现了卓越的检索效果。在跨语言信息检索(CLIR)中,一种语言的查询检索其他语言的相关文档。我们研究双向阿拉伯语-英语CLIR的机器翻译(MT)和机器可读词典(MRD)。与这些资源相关的翻译歧义是关键问题。我们提出了三种使用阿拉伯语-英语CLIR双语词典的查询翻译方法。首先,我们介绍了Every-Match(EM)方法。此方法产生歧义的翻译。因此,我们提出了First-Match(FM)方法,该方法将字典中的第一个匹配项视为候选术语。最后,我们提出了一种新颖的翻译模型,称为两阶段(TP)。我们还根据经验使用短,中和长查询评估阿拉伯语-英语MT方法的有效性。通过MRD和英语-阿拉伯语MT系统评估英语-阿拉伯语CUR。翻译后扩展技术用于不强调由MRD和MT为英语-阿拉伯语CLIR引入的无关术语。

著录项

  • 作者

    Aljlayl, Mohammed A.;

  • 作者单位

    Illinois Institute of Technology.;

  • 授予单位 Illinois Institute of Technology.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2002
  • 页码 120 p.
  • 总页数 120
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号