首页> 外文期刊>ACM Transactions on Information Systems >Exploiting Representations from Statistical Machine Translation for Cross-Language Information Retrieval
【24h】

Exploiting Representations from Statistical Machine Translation for Cross-Language Information Retrieval

机译:利用统计机器翻译中的表示形式进行跨语言信息检索

获取原文
获取原文并翻译 | 示例

摘要

This work explores how internal representations of modern statistical machine translation systems can be exploited for cross-language information retrieval. We tackle two core issues that are central to query translation: how to exploit context to generate more accurate translations and how to preserve ambiguity that may be present in the original query, thereby retaining a diverse set of translation alternatives. These two considerations are often in tension since ambiguity in natural language is typically resolved by exploiting context, but effective retrieval requires striking the right balance. We propose two novel query translation approaches: the grammar-based approach extracts translation probabilities from translation grammars, while the decoder-based approach takes advantage of re-best translation hypotheses. Both are context-sensitive, in contrast to a baseline context-insensitive approach that uses bilingual dictionaries for word-by-word translation. Experimental results show that by "opening up" modern statistical machine translation systems, we can access intermediate representations that yield high retrieval effectiveness. By combining evidence from multiple sources, we demonstrate significant improvements over competitive baselines on standard cross-language information retrieval test collections. In addition to effectiveness, the efficiency of our techniques are explored as well.
机译:这项工作探索了如何利用现代统计机器翻译系统的内部表示来进行跨语言信息检索。我们解决了两个对查询翻译至关重要的核心问题:如何利用上下文生成更准确的翻译,以及如何保留原始查询中可能存在的歧义,从而保留了多种翻译选择。由于自然语言中的歧义通常是通过利用上下文来解决的,因此这两个考虑常常处于紧张状态,但有效的检索需要取得适当的平衡。我们提出了两种新颖的查询翻译方法:基于语法的方法从翻译语法中提取翻译概率,而基于解码器的方法则利用了最佳翻译假设。与基线上下文无关的方法相反,基线上下文无关的方法都使用双语词典进行逐词翻译,这两种方法都是上下文相关的。实验结果表明,通过“开放”现代统计机器翻译系统,我们可以访问产生较高检索效率的中间表示形式。通过结合来自多个来源的证据,我们证明了在标准跨语言信息检索测试集合上比竞争基准有了显着改进。除了有效性之外,还探索了我们技术的效率。

著录项

  • 来源
    《ACM Transactions on Information Systems》 |2014年第4期|19.1-19.32|共32页
  • 作者

    FERHAN TURE; JIMMY LIN;

  • 作者单位

    Raytheon BBN Technologies, 10 Moulton St., Cambridge, MA 02138;

    The iSchool, College of Information Studies, University of Maryland at College Park;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Algorithms; Experimentation;

    机译:算法;实验性;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号