首页> 外文期刊>Computer speech and language >Vector sentences representation for data selection in statistical machine translation
【24h】

Vector sentences representation for data selection in statistical machine translation

机译:矢量句子表示为数据选择统计机器翻译

获取原文
获取原文并翻译 | 示例

摘要

One of the most popular approaches to machine translation consists in formulating the problem as a pattern recognition approach. Under this perspective, bilingual corpora are precious resources, as they allow for a proper estimation of the underlying models. In this framework, selecting the best possible corpus is critical, and data selection aims to find the best subset of the bilingual sentences from an available pool of sentences such that the final translation quality is improved. In this paper, we present a new data selection technique that leverages a continuous vector-space representation of sentences. Experimental results report improvements compared not only with a system trained only with in-domain data, but also compared with a system trained on all the available data. Finally, we compared our proposal with other state-of-the-art data selection techniques (Cross-entropy selection and Infrequent ngrams recovery) in two different scenarios, obtaining very promising results with our proposal: our data selection strategy is able to yield results that are at least as good as the best-performing strfategy for each scenario. The empirical results reported are coherent across different language pairs. (C) 2018 Elsevier Ltd. All rights reserved.
机译:机器翻译最受欢迎的方法之一包括将问题作为模式识别方法组成。在这种观点来看,双语学生是宝贵的资源,因为它们允许正确估计潜在的模型。在此框架中,选择最佳的语料库是关键的,数据选择旨在从可用的句子中找到一个双语句子的最佳子集,以便提高最终的翻译质量。在本文中,我们提出了一种新的数据选择技术,它利用句子的连续矢量空间表示。实验结果报告改进不仅可以使用仅具有域名数据的系统,而且还与在所有可用数据上培训的系统相比。最后,我们将我们的提案与其他最先进的数据选择技术(交叉熵选择和不常见的Ngrams恢复)进行了比较了两个不同的场景,从我们的提议获得了非常有前途的结果:我们的数据选择策略能够产生结果这至少与每个场景都是最佳的strfyateg。报告的经验结果在不同的语言对中是连贯的。 (c)2018年elestvier有限公司保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号