首页> 外文会议>International conference on language resources and evaluation >Same Domain Different Discourse Style A Case Study on Language Resources for Data-driven Machine Translation
【24h】

Same Domain Different Discourse Style A Case Study on Language Resources for Data-driven Machine Translation

机译:相同领域不同话语风格的数据驱动机器翻译语言资源案例研究

获取原文

摘要

Data-driven machine translation (MT) approaches became very popular during last years, especially for language pairs for which it is difficult to find specialists to develop transfer rules. Statistical (SMT) or example-based (EBMT) systems can provide reasonable translation quality for assimilation purposes, as long as a large amount of training data is available. Especially SMT systems rely on parallel aligned corpora which have to be statistical relevant for the given language pair. The construction of large domain specific parallel corpora is time- and cost-consuming; the current practice relies on one or two big such corpora per language pair. Recent developed strategies ensure certain portability to other domains through specialized lexicons or small domain specific corpora. In this paper we discuss the influence of different discourse styles on statistical machine translation systems. We investigate how a pure SMT performs when training and test data belong to same domain but the discourse style varies.
机译:数据驱动的机器翻译(MT)方法在最近几年变得非常流行,尤其是对于难以找到专家来制定传输规则的语言对。只要有大量训练数据可用,统计(SMT)或基于示例的(EBMT)系统就可以为同化目的提供合理的翻译质量。特别是SMT系统依赖于并行对齐的语料库,该语料库必须与给定语言对在统计上相关。大型领域特定并行语料库的构建既费时又费钱;当前的做法是每个语言对依靠一两个大型语料库。最近开发的策略可确保通过专门的词典或特定于小型域的语料库将某些域移植到其他域。在本文中,我们讨论了不同话语风格对统计机器翻译系统的影响。我们研究了当训练和测试数据属于同一领域但话语风格不同时,纯SMT的性能如何。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号