Detecting and Correcting Syntactic Errors in Machine Translation Using Feature-Based Lexicalized Tree Adjoining Grammars

机译：使用基于特征的词法化树形邻接语法在机器翻译中检测和纠正语法错误

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Statistical machine translation has made tremendous progress over the past ten years. The output of even the best systems, however, is often ungrammatical because of the lack of sufficient linguistic knowledge. Even when systems incorporate syntax in the translation process, syntactic errors still result. To address this issue, we present a novel approach for detecting and correcting ungrammatical translations. In order to simultaneously detect multiple errors and their corresponding words in a formal framework, we use feature-based lexicalized tree adjoining grammars (FB-LTAG). In FB-LTAG, each lexical item is associated with a syntactic elementary tree, in which each node is associated with a set of feature-value pairs, called Attribute Value Matrices (AVMs). AVMs define the lexical item's syntactic usage. Our syntactic error detection works by checking the AVM values of all lexical items within a sentence using a unification framework. Thus, we use the feature structures in the AVMs to detect the error type and corresponding words. In order to simultaneously detect multiple error types and track their corresponding words, we propose a new unification method which allows the unification procedure to continue when unification fails and also to propagate the failure information to relevant words. We call the modified unification a fail propagation unification. Our approach features: 1) the use of XTAG grammar, a rule-based English grammar developed by linguists using the FB-LTAG formalism, 2) the ability to simultaneously detect multiple ungrammatical types and their corresponding words by using FB-LTAG，s feature unifications, and 3) the ability to simultaneously correct multiple ungrammatical types based on the detection information. Grammar checking methods are usually divided into three classes: statistic-based checking, rule-based checking and syntax-based checking. Our approach is a mix of rule-based checking and syntax-based checking: The XTAG English grammar is designed by linguists while the detecting procedure is based on syntactic operations which dynamically reference the grammar. In our procedure for syntactic error detection, we first decomposes each sentence hypothesis parse tree into elementary trees, followed by associating each elementary tree with AVMs through look-up in the XTAG grammar, and finally reconstruct the original parse tree out of the elementary trees using substitution and adjunction operations along with AVM unifications with fail propagation ability. Once error types and their corresponding words are detected, one is able to correct errors based on a unified consideration of all related words under the same error types. In this paper, we present some simple mechanism to handle part of the detected situations. We use our approach to detect and correct translations of six single statistical machine translation systems. The results show that most of the corrected translations are improved.

机译：在过去的十年中，统计机器翻译取得了巨大的进步。然而，由于缺乏足够的语言知识，即使是最好的系统，其输出也常常是不合语法的。即使系统在翻译过程中加入了语法，仍然会导致语法错误。为了解决这个问题，我们提出了一种新颖的方法来检测和纠正语法错误的翻译。为了在正式框架中同时检测多个错误及其对应的单词，我们使用了基于特征的词法化树形邻接语法（FB-LTAG）。在FB-LTAG中，每个词法项都与一个语法基本树相关联，其中的每个节点都与一组称为属性值矩阵（AVM）的特征值对相关联。 AVM定义词汇项的句法用法。我们的句法错误检测通过使用统一框架检查句子中所有词汇项的AVM值来工作。因此，我们使用AVM中的特征结构来检测错误类型和相应的单词。为了同时检测多个错误类型并跟踪它们的对应单词，我们提出了一种新的统一方法，该方法允许在统一失败时继续进行统一过程，并将失败信息传播到相关单词。我们将修改后的统一称为失败传播统一。我们的方法具有以下特点：1）使用XTAG语法，这是语言学家使用FB-LTAG形式主义开发的基于规则的英语语法，2）能够通过使用FB-LTAG的特征同时检测多种非语法类型及其对应的单词的功能统一;以及3）根据检测信息同时更正多种非语法类型的能力。语法检查方法通常分为三类：基于统计的检查，基于规则的检查和基于语法的检查。我们的方法是基于规则的检查和基于语法的检查的混合：XTAG英语语法是由语言学家设计的，而检测过程则基于动态引用语法的语法操作。在我们的语法错误检测过程中，我们首先将每个句子假设分析树分解为基本树，然后通过XTAG语法中的查找将每个基本树与AVM关联，最后使用替换和附加操作以及具有故障传播能力的AVM统一。一旦检测到错误类型及其对应的单词，就可以基于对相同错误类型下所有相关单词的统一考虑来纠正错误。在本文中，我们提出了一些简单的机制来处理部分检测到的情况。我们使用我们的方法来检测和纠正六个单一统计机器翻译系统的翻译。结果表明，大多数已纠正的翻译均得到了改进。

著录项

来源
《Conference on Computational Linguistics and Speech Processing》|2012年|142-143|共2页
会议地点
作者
Wei-Yun Ma; Kathleen McKeown;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A combined syntactic-semantic embedding model based on lexicalized tree-adjoining grammar [J] . Hoang-Vu Dang, Phuong Le-Hong Computer speech and language . 2021,第Jula期

机译：基于词汇化树立邻接语法的组合句法 - 语义嵌入模型
2. Correcting Syntactic Annotation Errors Using a Synchronous Tree Substitution Grammar [J] . Yoshihide KATO, Shigeki MATSUBARA IEICE transactions on information and systems . 2010,第9期

机译：使用同步树替换语法纠正语法注释错误
3. Correcting Syntactic Annotation Errors Using a Synchronous Tree Substitution Grammar [J] . Yoshihide KATO, Shigeki MATSUBARA IEICE Transactions on Information and Systems . 2010,第9期

机译：使用同步树替换语法纠正语法注释错误
4. Capturing Language Specific Constraints on Lexical Selection with Feature-Based Lexicalized Tree-Adjoining Grammars [C] . Chunghye Han, Fei Xia, Martha Palmer, The latest technological advancement amp; applications . 1996

机译：使用基于特征的词化树形邻接语法，捕获针对词法选择的特定语言约束
5. Semantic role labeling using Lexicalized Tree Adjoining Grammars [D] . Liu, Yudong 2009

机译：使用词法化树邻接语法的语义角色标签
6. Single-pixel interior filling function approach for detecting and correcting errors in particle tracking [O] . Stanislav Burov, Patrick Figliozzi, Binhua Lin, 2017

机译：用于检测和纠正粒子跟踪错误的单像素内部填充功能方法
7. Detecting and Correcting Syntactic Errors in Machine Translation Using Feature-Based Lexicalized Tree Adjoining Grammars [O] . Ma Wei-Yun, McKeown Kathleen 2012

机译：使用基于特征的词法化树邻接语法在机器翻译中检测和纠正语法错误
8. Two Recent Developments in Tree Adjoining Grammars: Semantics and Efficient Processing. [R] . Schabes, Y., Joshi, A. K. 1990

机译：树邻接语法的两个最新进展：语义和有效处理。

Detecting and Correcting Syntactic Errors in Machine Translation Using Feature-Based Lexicalized Tree Adjoining Grammars

摘要

著录项

相似文献

相关主题

期刊订阅