首页> 外文OA文献 >A hybrid approach to statistical machine translation between standard and dialectal varieties
【2h】

A hybrid approach to statistical machine translation between standard and dialectal varieties

机译:标准和方言变体之间统计机器翻译的混合方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Using statistical machine translation (SMT) for dialectal varieties usually suffers from data sparsity, but combining word-level and character-level models can yield good results even with small training data by exploiting the relative proximity between the two varieties. In this paper, we describe a specific problem and its solution, arising with the translation between standard Austrian German and Viennese dialect. In a phrase-based approach of SMT, complex lexical transformations and syntactic reordering cannot be dealt with. These are typical cases where rule-based preprocessing of the source data is the preferable option, hence the hybrid character of the resulting system. One such case is the transformation between imperfect verb forms to perfect tense, which involves detection of clause boundaries and identification of clause type. We present an approach that utilizes a full parse of the source sentences and discuss the problems that arise with such an approach. Within the developed SMT system, the models trained on preprocessed data unsurprisingly fare better than those trained on the original data, but also unchanged sentences gain slightly better scores. This shows that including a rule-based layer dealing with systematic non-local transformations increases the overall performance of the system, most probably due to a higher accuracy in the alignment.
机译:对于方言变体使用统计机器翻译(SMT)通常会遇到数据稀疏的问题,但是即使利用少量的训练数据,通过利用两个变体之间的相对接近性,将单词级和字符级模型结合起来也可以产生良好的结果。在本文中,我们描述了一个特定的问题及其解决方案,它是由标准的奥地利德语和维也纳方言之间的翻译引起的。在基于短语的SMT方法中,无法处理复杂的词汇转换和句法重排。在典型情况下,基于规则的源数据预处理是首选方案,因此是结果系统的混合特征。一种这样的情况是不完美动词形式到完美时态之间的转换,这涉及从句边界的检测和从句类型的识别。我们提出一种利用源句的完整解析的方法,并讨论这种方法所产生的问题。在已开发的SMT系统中,毫无疑问,在预处理数据上训练的模型比在原始数据上训练的模型要好,但是不变的句子也会获得更好的分数。这表明包括处理系统非局部转换的基于规则的层可以提高系统的整体性能,这很可能是由于对齐方式的准确性更高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号