首页> 外文期刊>Machine translation >A kernel regression framework for SMT
【24h】

A kernel regression framework for SMT

机译:SMT的内核回归框架

获取原文
获取原文并翻译 | 示例
       

摘要

This paper presents a novel regression framework to model both the translational equivalence problem and the parameter estimation problem in statistical machine translation (SMT). The proposed method kernelizes the training process by formulating the translation problem as a linear mapping among source and target word chunks (word n-grams of various length), which yields a regression problem with vector outputs. A kernel ridge regression model and a one-class classifier called maximum margin regression are explored for comparison, between which the former is proved to perform better in this task. The experimental results conceptually demonstrate its advantages of handling very high-dimensional features implicitly and flexibly. However, it shares the common drawback of kernel methods, i.e. the lack of scalability. For real-world application, a more practical solution based on locally linear regression hyperplane approximation is proposed by using online relevant training examples subsetting. In addition, we also introduce a novel way to integrate language models into this particular machine translation framework, which utilizes the language model as a penalty item in the objective function of the regression model, since its n-gram representation exactly matches the definition of our feature space.
机译:本文提出了一种新颖的回归框架,可以对统计机器翻译(SMT)中的翻译对等问题和参数估计问题进行建模。所提出的方法通过将翻译问题公式化为源和目标单词块(各种长度的单词n-gram)之间的线性映射,将训练过程内核化,从而产生带有向量输出的回归问题。探索了内核岭回归模型和称为最大余量回归的一类分类器进行比较,事实证明前者在此任务中表现更好。实验结果从概念上证明了其隐式且灵活地处理非常高维特征的优势。但是,它具有内核方法的共同缺点,即缺乏可伸缩性。对于实际应用,通过使用在线相关训练示例子集,提出了一种基于局部线性回归超平面逼近的更实用的解决方案。此外,我们还介绍了一种将语言模型集成到此特定机器翻译框架中的新颖方法,该方法在回归模型的目标函数中利用语言模型作为惩罚项,因为其n-gram表示与我们的定义完全匹配功能空间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号