首页> 外文会议>Conference on Computational Linguistics and Speech Processing >Detecting and Correcting Syntactic Errors in Machine Translation Using Feature-Based Lexicalized Tree Adjoining Grammars
【24h】

Detecting and Correcting Syntactic Errors in Machine Translation Using Feature-Based Lexicalized Tree Adjoining Grammars

机译:使用基于特征的词法化树形邻接语法在机器翻译中检测和纠正语法错误

获取原文

摘要

Statistical machine translation has made tremendous progress over the past ten years. The output of even the best systems, however, is often ungrammatical because of the lack of sufficient linguistic knowledge. Even when systems incorporate syntax in the translation process, syntactic errors still result. To address this issue, we present a novel approach for detecting and correcting ungrammatical translations. In order to simultaneously detect multiple errors and their corresponding words in a formal framework, we use feature-based lexicalized tree adjoining grammars (FB-LTAG). In FB-LTAG, each lexical item is associated with a syntactic elementary tree, in which each node is associated with a set of feature-value pairs, called Attribute Value Matrices (AVMs). AVMs define the lexical item's syntactic usage. Our syntactic error detection works by checking the AVM values of all lexical items within a sentence using a unification framework. Thus, we use the feature structures in the AVMs to detect the error type and corresponding words. In order to simultaneously detect multiple error types and track their corresponding words, we propose a new unification method which allows the unification procedure to continue when unification fails and also to propagate the failure information to relevant words. We call the modified unification a fail propagation unification. Our approach features: 1) the use of XTAG grammar, a rule-based English grammar developed by linguists using the FB-LTAG formalism, 2) the ability to simultaneously detect multiple ungrammatical types and their corresponding words by using FB-LTAG,s feature unifications, and 3) the ability to simultaneously correct multiple ungrammatical types based on the detection information. Grammar checking methods are usually divided into three classes: statistic-based checking, rule-based checking and syntax-based checking. Our approach is a mix of rule-based checking and syntax-based checking: The XTAG English grammar is designed by linguists while the detecting procedure is based on syntactic operations which dynamically reference the grammar. In our procedure for syntactic error detection, we first decomposes each sentence hypothesis parse tree into elementary trees, followed by associating each elementary tree with AVMs through look-up in the XTAG grammar, and finally reconstruct the original parse tree out of the elementary trees using substitution and adjunction operations along with AVM unifications with fail propagation ability. Once error types and their corresponding words are detected, one is able to correct errors based on a unified consideration of all related words under the same error types. In this paper, we present some simple mechanism to handle part of the detected situations. We use our approach to detect and correct translations of six single statistical machine translation systems. The results show that most of the corrected translations are improved.
机译:在过去的十年中,统计机器翻译取得了巨大的进步。然而,由于缺乏足够的语言知识,即使是最好的系统,其输出也常常是不合语法的。即使系统在翻译过程中加入了语法,仍然会导致语法错误。为了解决这个问题,我们提出了一种新颖的方法来检测和纠正语法错误的翻译。为了在正式框架中同时检测多个错误及其对应的单词,我们使用了基于特征的词法化树形邻接语法(FB-LTAG)。在FB-LTAG中,每个词法项都与一个语法基本树相关联,其中的每个节点都与一组称为属性值矩阵(AVM)的特征值对相关联。 AVM定义词汇项的句法用法。我们的句法错误检测通过使用统一框架检查句子中所有词汇项的AVM值来工作。因此,我们使用AVM中的特征结构来检测错误类型和相应的单词。为了同时检测多个错误类型并跟踪它们的对应单词,我们提出了一种新的统一方法,该方法允许在统一失败时继续进行统一过程,并将失败信息传播到相关单词。我们将修改后的统一称为失败传播统一。我们的方法具有以下特点:1)使用XTAG语法,这是语言学家使用FB-LTAG形式主义开发的基于规则的英语语法,2)能够通过使用FB-LTAG的特征同时检测多种非语法类型及其对应的单词的功能统一;以及3)根据检测信息同时更正多种非语法类型的能力。语法检查方法通常分为三类:基于统计的检查,基于规则的检查和基于语法的检查。我们的方法是基于规则的检查和基于语法的检查的混合:XTAG英语语法是由语言学家设计的,而检测过程则基于动态引用语法的语法操作。在我们的语法错误检测过程中,我们首先将每个句子假设分析树分解为基本树,然后通过XTAG语法中的查找将每个基本树与AVM关联,最后使用替换和附加操作以及具有故障传播能力的AVM统一。一旦检测到错误类型及其对应的单词,就可以基于对相同错误类型下所有相关单词的统一考虑来纠正错误。在本文中,我们提出了一些简单的机制来处理部分检测到的情况。我们使用我们的方法来检测和纠正六个单一统计机器翻译系统的翻译。结果表明,大多数已纠正的翻译均得到了改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号