...
首页> 外文期刊>Information Retrieval >Leveraging comparable corpora for cross-lingual information retrieval in resource-lean language pairs
【24h】

Leveraging comparable corpora for cross-lingual information retrieval in resource-lean language pairs

机译:利用可比语料库在资源贫乏的语言对中进行跨语言信息检索

获取原文
获取原文并翻译 | 示例
           

摘要

Cross-language information retrieval (CLIR) has so far been studied with the assumption that some rich linguistic resources such as bilingual dictionaries or parallel corpora are available. But creation of such high quality resources is labor-intensive and they are not always at hand. In this paper we investigate the feasibility of using only comparable corpora for CLIR, without relying on other linguistic resources. Comparable corpora are text documents in different languages that cover similar topics and are often naturally attainable (e.g., news articles published in different languages at the same time period). We adapt an existing cross-lingual word association mining method and incorporate it into a language modeling approach to cross-language retrieval. We investigate different strategies for estimating the target query language models. Our evaluation results on the TREC Arabic–English cross-lingual data show that the proposed method is effective for the CLIR task, demonstrating that it is feasible to perform cross-lingual information retrieval with just comparable corpora.
机译:迄今为止,已经在假设某些丰富的语言资源(例如双语词典或并行语料库)可用的情况下研究了跨语言信息检索(CLIR)。但是,创建如此高质量的资源是劳动密集型的,而且并非总是如此。在本文中,我们研究了仅使用可比较语料库进行CLIR而不依赖其他语言资源的可行性。可比语料库是使用不同语言的文本文档,涵盖了相似的主题,并且通常很容易获得(例如,在同一时间以不同语言发布的新闻报道)。我们采用现有的跨语言单词关联挖掘方法,并将其纳入语言建模方法中以进行跨语言检索。我们研究了用于估计目标查询语言模型的不同策略。我们对TREC阿拉伯语-英语跨语言数据的评估结果表明,所提出的方法对于CLIR任务是有效的,表明仅使用可比较的语料库进行跨语言信息检索是可行的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号