首页> 外文会议>International conference on computational linguistics >Combining Statistical Translation Techniques for Cross-Language Information Retrieval
【24h】

Combining Statistical Translation Techniques for Cross-Language Information Retrieval

机译:结合统计翻译技术进行跨语言信息检索

获取原文

摘要

Cross-language information retrieval today is dominated by techniques that rely principally on context-independent token-to-token mappings despite the fact that state-of-the-art statistical machine translation systems now have far richer translation models available in their internal representations. This paper explores combination-of-evidence techniques using three types of statistical translation models: context-independent token translation, token translation using phrase-dependent contexts, and token translation using sentence-dependent contexts. Context-independent translation is performed using statistically-aligned tokens in parallel text, phrase-dependent translation is performed using aligned statistical phrases, and sentence-dependent translation is performed using those same aligned phrases together with an n-gram language model Experiments on retrieval of Arabic, Chinese, and French documents using English queries show that no one technique is optimal for all queries, but that statistically significant improvements in mean average precision over strong baselines can be achieved by combining translation evidence from all three techniques. The optimal combination is, however, found to be resource-dependent, indicating a need for future work on robust tuning to the characteristics of individual collections.
机译:今天的跨语言信息检索是由依赖上下文的令牌到令牌映射的技术主导,尽管最先进的统计机器翻译系统现在具有远程富裕的翻译模型,其内部表示。本文探讨了使用三种类型的统计翻译模型的证据技术:与上下文的令牌转换,使用句子依赖上下文的令牌转换,以及使用句子依赖上下文的令牌转换。在并行文本中使用统计对齐的令牌执行与上下文的转换,使用对齐的统计短语来执行短语依赖的转换,并且使用与n克语言模型实验一起使用相同的对齐的短语来执行句子相关翻译。使用英语查询的阿拉伯语,中文和法国文档表明,对于所有查询,没有一种技术是最佳的,但是通过将来自所有三种技术的翻译证据结合转换证据,可以实现强强基线的平均平均精度的统计学上显着的改进。然而,最佳组合被发现是资源相关的,这表明需要将来的工作稳健地调整到各个集合的特征。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号