首页> 外文会议>International conference on computational linguistics >Combining Statistical Translation Techniques for Cross-Language Information Retrieval
【24h】

Combining Statistical Translation Techniques for Cross-Language Information Retrieval

机译:结合统计翻译技术进行跨语言信息检索

获取原文

摘要

Cross-language information retrieval today is dominated by techniques that rely principally on context-independent token-to-token mappings despite the fact that state-of-the-art statistical machine translation systems now have far richer translation models available in their internal representations. This paper explores combination-of-evidence techniques using three types of statistical translation models: context-independent token translation, token translation using phrase-dependent contexts, and token translation using sentence-dependent contexts. Context-independent translation is performed using statistically-aligned tokens in parallel text, phrase-dependent translation is performed using aligned statistical phrases, and sentence-dependent translation is performed using those same aligned phrases together with an n-gram language model Experiments on retrieval of Arabic, Chinese, and French documents using English queries show that no one technique is optimal for all queries, but that statistically significant improvements in mean average precision over strong baselines can be achieved by combining translation evidence from all three techniques. The optimal combination is, however, found to be resource-dependent, indicating a need for future work on robust tuning to the characteristics of individual collections.
机译:今天的跨语言信息检索以主要依赖于上下文无关的令牌到令牌映射的技术为主导,尽管事实是,最新的统计机器翻译系统现在在其内部表示中具有更丰富的翻译模型。本文探讨了使用三种类型的统计翻译模型的证据组合技术:上下文无关的标记翻译,使用短语相关的上下文的标记翻译和句子相关的上下文的标记翻译。使用平行文本中的统计对齐标记执行上下文无关的翻译,使用对齐的统计短语执行短语相关的翻译,使用这些相同的对齐短语以及n-gram语言模型执行句子相关的翻译使用英语查询的阿拉伯文,中文和法文文档显示,没有一种技术适合所有查询,但是通过结合这三种技术的翻译证据,可以实现在平均基准精度上超过强基准的统计学上显着的改进。然而,最佳组合被发现是依赖于资源的,这表明需要在未来的工作中进行鲁棒的调整以适应各个馆藏的特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号