首页> 外文会议>International joint conference on natural language processing;Conference on empirical methods in natural language processing >The Challenges of Optimizing Machine Translation for Low Resource Cross-Language Information Retrieval
【24h】

The Challenges of Optimizing Machine Translation for Low Resource Cross-Language Information Retrieval

机译:低资源跨语言信息检索中优化机器翻译的挑战

获取原文

摘要

When performing cross-language information retrieval (CLIR) for lower-resourced languages, a common approach is to retrieve over the output of machine translation (MT). However, there is no established guidance on how to optimize the resulting MT-IR system. In this paper, we examine the relationship between the performance of MT systems and both neural and term frequency-based IR models to identify how CLIR performance can be best predicted from MT quality. We explore performance at varying amounts of MT training data, byte pair encoding (BPE) merge operations, and across two IR collections and retrieval models. We find that the choice of IR collection can substantially affect the predictive power of MT tuning decisions and evaluation, potentially introducing dissociations between MT-only and overall CLIR performance.
机译:当针对资源较少的语言执行跨语言信息检索(CLIR)时,一种常见的方法是从机器翻译(MT)的输出中进行检索。但是,关于如何优化最终的MT-IR系统,尚无确定的指南。在本文中,我们研究了MT系统性能与基于神经和基于术语频率的IR模型之间的关系,以确定如何从MT质量中最佳预测CLIR性能。我们探索了在不同数量的MT训练数据,字节对编码(BPE)合并操作以及两个IR收集和检索模型下的性能。我们发现,IR收集的选择会极大地影响MT调整决策和评估的预测能力,从而有可能在纯MT和总体CLIR性能之间造成分离。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号