首页> 外文会议>LREC-2012 >Same Domain Different Discourse Style A Case Study on Language Resources for Data-driven Machine Translation
【24h】

Same Domain Different Discourse Style A Case Study on Language Resources for Data-driven Machine Translation

机译:相同的域不同话语风格是数据驱动机器翻译语言资源的案例研究

获取原文

摘要

Data-driven machine translation (MT) approaches became very popular during last years, especially for language pairs for which it is difficult to find specialists to develop transfer rules. Statistical (SMT) or example-based (EBMT) systems can provide reasonable translation quality for assimilation purposes, as long as a large amount of training data is available. Especially SMT systems rely on parallel aligned corpora which have to be statistical relevant for the given language pair. The construction of large domain specific parallel corpora is time- and cost-consuming; the current practice relies on one or two big such corpora per language pair. Recent developed strategies ensure certain portability to other domains through specialized lexicons or small domain specific corpora. In this paper we discuss the influence of different discourse styles on statistical machine translation systems. We investigate how a pure SMT performs when training and test data belong to same domain but the discourse style varies.
机译:数据驱动的机器翻译(MT)方法在过去几年中变得非常流行,特别是对于难以找到专家来开发转移规则的语言对。基于统计(SMT)或基于示例的(EBMT)系统可以提供合理的转换质量以进行同化目的,只要可以使用大量培训数据。特别是SMT系统依赖于并行对齐的语料库,这对于给定的语言对具有统计相关的。大型域特定平行语料库的建设是时间和成本耗费;目前的做法依赖于每种语言对的一个或两个大型公司。最近的开发策略通过专门的词典或小型域特定对特定域的某些可携带性。本文讨论了不同话语风格对统计机器翻译系统的影响。我们调查如何在培训和测试数据属于同一域时进行纯SMT如何执行,但话语方式变化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号