首页> 外文会议>9th Workshop on statistical machine translation >An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation
【24h】

An Empirical Comparison of Features and Tuning for Phrase-based Machine Translation

机译:基于短语的机器翻译的功能和调优的经验比较

获取原文
获取原文并翻译 | 示例

摘要

Scalable discriminative training methods are now broadly available for estimating phrase-based, feature-rich translation models. However, the sparse feature sets typically appearing in research evaluations are less attractive than standard dense features such as language and translation model probabilities: they often overfit, do not generalize, or require complex and slow feature extractors. This paper introduces extended features, which are more specific than dense features yet more general than lexicalized sparse features. Large-scale experiments show that extended features yield robust BLEU gains for both Arabic-English (+1.05) and Chinese-English (+0.67) relative to a strong feature-rich baseline. We also specialize the feature set to specific data domains, identify an objective function that is less prone to overfitting, and release fast, scalable, and language-independent tools for implementing the features.
机译:可扩展的判别式训练方法现在广泛用于估计基于短语的,功能丰富的翻译模型。但是,通常在研究评估中出现的稀疏特征集不如诸如语言和翻译模型概率之类的标准密集特征吸引人:它们通常过拟合,不泛化或需要复杂而缓慢的特征提取器。本文介绍了扩展特征,这些特征比密集特征更具体,但比词汇化稀疏特征更通用。大规模实验表明,相对于功能丰富的基线,扩展功能可为阿拉伯语-英语(+1.05)和中文-英语(+0.67)产生强劲的BLEU增益。我们还将功能集专用于特定的数据域,确定不太容易过度拟合的目标功能,并发布用于实现功能的快速,可扩展且独立于语言的工具。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号