...
首页> 外文期刊>Computer speech and language >Vector sentences representation for data selection in statistical machine translation
【24h】

Vector sentences representation for data selection in statistical machine translation

机译:统计机器翻译中用于数据选择的矢量句子表示

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

One of the most popular approaches to machine translation consists in formulating the problem as a pattern recognition approach. Under this perspective, bilingual corpora are precious resources, as they allow for a proper estimation of the underlying models. In this framework, selecting the best possible corpus is critical, and data selection aims to find the best subset of the bilingual sentences from an available pool of sentences such that the final translation quality is improved. In this paper, we present a new data selection technique that leverages a continuous vector-space representation of sentences. Experimental results report improvements compared not only with a system trained only with in-domain data, but also compared with a system trained on all the available data. Finally, we compared our proposal with other state-of-the-art data selection techniques (Cross-entropy selection and Infrequent ngrams recovery) in two different scenarios, obtaining very promising results with our proposal: our data selection strategy is able to yield results that are at least as good as the best-performing strfategy for each scenario. The empirical results reported are coherent across different language pairs. (C) 2018 Elsevier Ltd. All rights reserved.
机译:机器翻译最流行的方法之一是将问题表述为模式识别方法。在这种情况下,双语语料库是宝贵的资源,因为它们允许对基础模型进行适当的估计。在此框架中,选择最佳可能的语料库至关重要,数据选择旨在从可用的句子池中找到双语句子的最佳子集,从而提高最终翻译质量。在本文中,我们提出了一种新的数据选择技术,该技术利用了句子的连续向量空间表示形式。实验结果表明,与仅使用域内数据训练的系统相比,与使用所有可用数据训练的系统相比,该方法都有改进。最后,我们在两种不同的情况下将我们的提案与其他最新数据选择技术(交叉熵选择和不频繁的ngram恢复)进行了比较,并获得了非常有希望的结果:我们的数据选择策略能够产生结果至少与每种情况下表现最佳的策略一样好。报告的经验结果在不同语言对之间是一致的。 (C)2018 Elsevier Ltd.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号