首页> 外文期刊>Journal of the American Society for Information Science and Technology >Combining Lexical and Statistical Translation Evidence for Cross-Language Information Retrieval
【24h】

Combining Lexical and Statistical Translation Evidence for Cross-Language Information Retrieval

机译:结合词汇和统计翻译证据进行跨语言信息检索

获取原文
获取原文并翻译 | 示例
           

摘要

This article explores how best to use lexical and statistical translation evidence together for cross-language information retrieval (CLIR). Lexical translation evidence is assembled from Wikipedia and from a large machine-readable dictionary, statistical translation evidence is drawn from parallel corpora, and evidence from co-occurrence in the document language provides a basis for limiting the adverse effect of translation ambiguity. Coverage statistics for NII Testbeds and Community for Information Access Research (NTCIR) queries confirm that these resources have complementary strengths. Experiments with translation evidence from a small parallel corpus indicate that even rather rough estimates of translation probabilities can yield further improvements over a strong technique for translation weighting based on using Jensen-Shannon divergence as a term-association measure. Finally, a novel approach to posttranslation query expansion using a random walk over the Wikipedia concept link graph is shown to yield further improvements over alternative techniques for posttranslation query expansion. Evaluation results on the NTCIR-5 English-Korean test collection show statistically significant improvements over strong baselines.
机译:本文探讨了如何最好地将词汇和统计翻译证据一起用于跨语言信息检索(CLIR)。词汇翻译证据来自维基百科和大型机器可读词典,统计翻译证据来自并行语料库,文档语言中同时出现的证据为限制翻译歧义性的不利影响提供了基础。 NII测试平台和信息访问研究社区(NTCIR)查询的覆盖范围统计数据证实,这些资源具有互补的优势。使用来自较小的平行语料库的翻译证据进行的实验表明,基于使用Jensen-Shannon发散作为术语关联度量,即使对翻译概率进行相当粗略的估计,也可以对强大的翻译加权技术产生进一步的改进。最后,显示了一种使用Wikipedia概念链接图上的随机游走来进行翻译后查询扩展的新颖方法,该方法比翻译后查询扩展的替代技术产生了进一步的改进。 NTCIR-5英语-韩语测试集上的评估结果显示,与强基准相比,统计上有显着改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号