首页> 外文会议>International Conference on Language Resources and Evaluation >Handle with Care: A Case Study in Comparable Corpora Exploitation for Neural Machine Translation
【24h】

Handle with Care: A Case Study in Comparable Corpora Exploitation for Neural Machine Translation

机译:谨慎处理:针对神经电机翻译比较的案例研究

获取原文
获取外文期刊封面目录资料

摘要

We present the results of a case study in the exploitation of comparable corpora for Neural Machine Translation. A large comparable corpus for Basque-Spanish was prepared, on the basis of independently-produced news by the Basque public broadcaster EITB, and we discuss the impact of various techniques to exploit the original data in order to determine optimal variants of the corpus. In particular, we show that filtering in terms of alignment thresholds and length-difference outliers has a significant impact on translation quality. The impact of tags identifying comparable data in the training datasets is also evaluated, with results indicating that this technique might be useful to help the models discriminate noisy information, in the form of informational imbalance between aligned sentences. The final corpus was prepared according to the experimental results and is made available to the scientific community for research purposes.
机译:我们介绍了案例研究中的神经机翻译比较集团的研究。在巴斯克公共广播公司EITB的独立生产的新闻的基础上,制定了一个大型的巴斯克语 - 西班牙语的语料库,我们讨论了各种技术利用原始数据的影响,以确定语料库的最佳变体。特别是,我们表明对准阈值和长度差异异常值的过滤对翻译质量产生了重大影响。还评估了标识训练数据集中可比数据的标签的影响,结果表明该技术可能有助于帮助模型判断嘈杂的信息,以方向性句子之间的信息不平衡的形式。最终的语料库是根据实验结果制备的,并为科学界提供了研究目的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号