首页> 外文OA文献 >Word alignment and smoothing methods in statistical machine translation: Noise, prior knowledge and overfitting
【2h】

Word alignment and smoothing methods in statistical machine translation: Noise, prior knowledge and overfitting

机译:统计机器翻译中的词对齐和平滑方法:噪声,先验知识和过度拟合

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This thesis discusses how to incorporate linguistic knowledge into an SMT system. Although one important category of linguistic knowledge is that obtained by a constituent / dependency parser, a POS / super tagger, and a morphological analyser, linguistic knowledge here includes larger domains than this: Multi-Word Expressions, Out-Of-Vocabulary words, paraphrases, lexical semantics (or non-literal translations), named-entities, coreferences, and transliterations. The first discussion is about word alignment where we propose a MWE-sensitive word aligner. The second discussion is about the smoothing methods for a language model and a translation model where we propose a hierarchical Pitman-Yor process-based smoothing method. The common grounds for these discussion are the examination of three exceptional cases from real-world data: the presenceudof noise, the availability of prior knowledge, and the problem of underfitting. Notable characteristics of this design are the careful usage of (Bayesian) priors in order that it can capture both frequent and linguistically important phenomena. This can be considered to provide one example to solve the problems of statistical models which often aim to learn from frequent examples only, and often overlook less frequent but linguistically important phenomena.
机译:本文讨论了如何将语言知识整合到SMT系统中。尽管语言知识的一个重要类别是通过成分/依存解析器,POS /超级标记器和形态分析器获得的语言知识,但是这里的语言知识包括的领域比这要大:多词表达,词汇量词,释义,词汇语义(或非文字翻译),命名实体,共指和音译。第一个讨论是关于字对齐的,我们提出了一个MWE敏感字对齐器。第二个讨论是关于语言模型和翻译模型的平滑方法,其中我们提出了一种基于Pitman-Yor过程的分层平滑方法。这些讨论的共同点是从现实世界的数据中检查三种例外情况:噪声的存在,噪声,先验知识的可用性以及拟合不足的问题。该设计的显着特征是(贝叶斯)先验的谨慎使用,以便可以捕获频繁出现的和在语言上很重要的现象。可以认为这提供了一个示例来解决统计模型的问题,该模型通常旨在仅从频繁的示例中学习,并且常常忽略了频率较低但在语言上很重要的现象。

著录项

  • 作者

    Okita Tsuyoshi;

  • 作者单位
  • 年度 2012
  • 总页数
  • 原文格式 PDF
  • 正文语种 en
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号