首页> 外文期刊>Computer speech and language >A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora
【24h】

A generalised alignment template formalism and its application to the inference of shallow-transfer machine translation rules from scarce bilingual corpora

机译:广义对齐模板形式化及其在稀缺双语语料库中浅传输机器翻译规则的推论中

获取原文
获取原文并翻译 | 示例

摘要

Statistical and rule-based methods are complementary approaches to machine translation (MT) that have different strengths and weaknesses. This complementarity has, over the last few years, resulted in the consolidation of a growing interest in hybrid systems that combine both data-driven and linguistic approaches. In this paper, we address the situation in which the amount of bilingual resources that is available for a particular language pair is not sufficiently large to train a competitive statistical MT system, but the cost and slow development cycles of rule-based MT systems cannot be afforded either. In this context, we formalise a new method that uses scarce parallel corpora to automatically infer a set of shallow-transfer rules to be integrated into a rule-based MT system, thus avoiding the need for human experts to handcraft these rules. Our work is based on the alignment template approach to phrase-based statistical MT, but the definition of the alignment template is extended to encompass different generalisation levels. It is also greatly inspired by the work of Sanchez-Martinez and Forcada (2009) in which alignment templates were also considered for shallow-transfer rule inference. However, our approach overcomes many relevant limitations of that work, principally those related to the inability to find the correct generalisation level for the alignment templates, and to select the subset of alignment templates that ensures an adequate segmentation of the input sentences by the rules eventually obtained. Unlike previous approaches in literature, our formalism does not require linguistic knowledge about the languages involved in the translation. Moreover, it is the first time that conflicts between rules are resolved by choosing the most appropriate ones according to a global minimisation function rather than proceeding in a pairwise greedy fashion. Experiments conducted using five different language pairs with the free/open-source rule-based MT platform Apertium show that translation quality significantly improves when compared to the method proposed by Sanchez-Martinez and Forcada (2009), and is close to that obtained using handcrafted rules. For some language pairs, our approach is even able to outperform them. Moreover, the resulting number of rules is considerably smaller, which eases human revision and maintenance.
机译:基于统计和基于规则的方法是机器翻译(MT)的补充方法,具有不同的优点和缺点。在过去的几年中,这种互补性导致人们对结合了数据驱动和语言方法的混合系统的兴趣日益浓厚。在本文中,我们解决了以下情况:特定语言对可用的双语资源数量不足以训练竞争性统计MT系统,但是基于规则的MT系统的成本和缓慢的开发周期无法满足要求负担得起。在这种情况下,我们正式确定了一种新方法,该方法使用稀缺的并行语料库自动推断出一组浅传输规则,以将它们集成到基于规则的MT系统中,从而避免了人类专家手工制定这些规则的需求。我们的工作基于针对短语的统计MT的对齐模板方法,但是对齐模板的定义已扩展为涵盖不同的概括级别。 Sanchez-Martinez和Forcada(2009)的工作也极大地启发了该工作,其中还考虑了对齐模板用于浅传递规则推理。但是,我们的方法克服了该工作的许多相关限制,主要是那些与无法为对齐模板找到正确的概括级别以及选择对齐模板的子集有关的限制,这些对齐模板最终将确保规则对输入句子进行充分的分割。获得。与以前的文学方法不同,我们的形式主义不需要关于翻译所涉及语言的语言知识。而且,这是第一次通过根据全局最小化函数选择最合适的规则而不是以成对贪婪的方式来解决规则之间的冲突。在基于自由/开源规则的MT平台Apertium上使用五种不同语言对进行的实验表明,与Sanchez-Martinez和Forcada(2009)提出的方法相比,翻译质量显着提高,并且接近于使用手工制作的方法规则。对于某些语言对,我们的方法甚至可以胜过它们。而且,所产生的规则数量要少得多,从而简化了人工修订和维护。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号