首页> 外文会议>WMT 2012 >Leave-One-Out Phrase Model Training for Large-Scale Deployment
【24h】

Leave-One-Out Phrase Model Training for Large-Scale Deployment

机译:留下一句话的大型部署模型培训

获取原文

摘要

Training the phrase table by force-aligning (FA) the training data with the reference translation has been shown to improve the phrasal translation quality while significantly reducing the phrase table size on medium sized tasks. We apply this procedure to several large-scale tasks, with the primary goal of reducing model sizes without sacrificing translation quality. To deal with the noise in the automatically crawled parallel training data, we introduce on-demand word deletions, insertions, and backoffs to achieve over 99% successful alignment rate. We also add heuristics to avoid any increase in OOV rates. We are able to reduce already heavily pruned baseline phrase tables by more than 50% with little to no degradation in quality and occasionally slight improvement, without any increase in OOVs. We further introduce two global scaling factors for re-estimation of the phrase table via posterior phrase alignment probabilities and a modified absolute discounting method that can be applied to fractional counts.
机译:通过强制对齐(FA)培训短语表(FA)具有参考翻译的培训数据,并显示了提高短语翻译质量,同时显着降低了中等大小任务的短语表大小。我们将此程序应用于几个大规模的任务,具有在不牺牲翻译质量的情况下减少模型大小的主要目标。要处理自动爬行并行培训数据中的噪声,我们介绍按需Word删除,插入和退避,以实现超过99%的成功对齐率。我们还添加了启发式,以避免OOV汇率的任何增加。我们能够减少已经过度修剪的基线短语表,超过50%,几乎没有质量下降,偶尔会略有改善,没有任何oov。我们进一步引入了两个全局缩放因子,用于通过后方短语对准概率重新估计短语表和可以应用于分数计数的修改的绝对折扣方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号