首页> 外文会议>International joint conference on natural language processing >The Challenges of Optimizing Machine Translation for Low Resource Cross-Language Information Retrieval
【24h】

The Challenges of Optimizing Machine Translation for Low Resource Cross-Language Information Retrieval

机译:优化低资源交通信息检索机器翻译的挑战

获取原文

摘要

When performing cross-language information retrieval (CLIR) for lower-resourced languages, a common approach is to retrieve over the output of machine translation (MT). However, there is no established guidance on how to optimize the resulting MT-IR system. In this paper, we examine the relationship between the performance of MT systems and both neural and term frequency-based IR models to identify how CLIR performance can be best predicted from MT quality. We explore performance at varying amounts of MT training data, byte pair encoding (BPE) merge operations, and across two IR collections and retrieval models. We find that the choice of IR collection can substantially affect the predictive power of MT tuning decisions and evaluation, potentially introducing dissociations between MT-only and overall CLIR performance.
机译:在执行较低资源语言的跨语言信息检索(CLIR)时,通常的方法是在机器翻译(MT)的输出上检索。但是,没有建立关于如何优化生成的MT-IR系统的指导。在本文中,我们研究了MT系统性能与神经和术语频率的IR模型之间的关系,以识别如何从MT质量预测CLIR性能。我们以不同数量的MT培训数据,字节对编码(BPE)合并操作探讨了性能,并跨两个IR集合和检索模型。我们发现IR集合的选择可以大大影响MT调整决策和评估的预测力,可能引入MT-oilm与整体CLIR性能之间的解剖。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号