首页> 外文OA文献 >Detecting grammatical errors with treebank-induced, probabilistic parsers

【2h】

Detecting grammatical errors with treebank-induced, probabilistic parsers

机译：使用树库引发的概率解析器检测语法错误

页面导航

摘要
著录项
相似文献
相关主题

摘要

Today's grammar checkers often use hand-crafted rule systems that define acceptable language. The development of such rule systems is labour-intensive and has to be repeated for each language. At the same time, grammars automatically induced from syntactically annotated corpora (treebanks) are successfully employed in other applications, for example text understanding and machine translation. At first glance, treebank-induced grammars seem to be unsuitable for grammar checking as they massively over-generate and fail to reject ungrammatical input due to their high robustness. We present three new methods for judging the grammaticality of a sentence with probabilistic, treebank-induced grammars, demonstrating that such grammars can be successfully applied to automatically judge the grammaticality of an input string. Our best-performing method exploits the differences between parse results for grammars trained on grammatical and ungrammatical treebanks. The second approach builds an estimator of the probability of the most likely parse using grammatical training data that has previously been parsed and annotated with parse probabilities. If the estimated probability of an input sentence (whose grammaticality is to be judged by the system) is higher by a certain amount than the actual parse probability, the sentence is flagged as ungrammatical. The third approach extracts discriminative parse tree fragments in the form of CFG rules from parsed grammatical and ungrammatical corpora and trains a binary classifier to distinguish grammatical from ungrammatical sentences. The three approaches are evaluated on a large test set of grammatical and ungrammatical sentences. The ungrammatical test set is generated automatically by inserting common grammatical errors into the British National Corpus. The results are compared to two traditional approaches, one that uses a hand-crafted, discriminative grammar, the XLE ParGram English LFG, and one based on part-of-speech n-grams. In addition, the baseline methods and the new methods are combined in a machine learning-based framework, yielding further improvements.

机译：当今的语法检查器经常使用手工制作的规则系统来定义可接受的语言。这种规则系统的开发是劳动密集型的，必须针对每种语言重复进行。同时，从句法标注的语料库（树库）自动导出的语法已成功应用于其他应用程序，例如文本理解和机器翻译。乍一看，树库诱发的语法似乎不适合进行语法检查，因为它们过度生成且由于其高鲁棒性而无法拒绝非语法输入。我们提出了三种新的判断概率的方法，这些概率是由树状诱发的概率语法证明的，这些语法可以成功地应用于自动判断输入字符串的语法。我们性能最好的方法利用了在语法和非语法树库上训练的语法的解析结果之间的差异。第二种方法是使用语法训练数据构建最可能解析的概率的估计器，该语法训练数据先前已被解析并用解析概率进行注释。如果输入句子的估计概率（其语法将由系统判断）比实际解析概率高出一定量，则将该句子标记为不符合语法。第三种方法从解析的语法和非语法语料库中提取CFG规则形式的判别式语法分析树片段，并训练一个二元分类器来区分语法和非语法句子。在语法和非语法句子的大型测试集上评估了这三种方法。通过将常见的语法错误插入到英国国家语料库中，可以自动生成非语法测试集。将结果与两种传统方法进行比较，一种是使用手工制作的判别语法，即XLE ParGram英文LFG，另一种是基于词性n-gram的。此外，将基线方法和新方法结合在基于机器学习的框架中，从而产生了进一步的改进。

著录项

作者
Wagner Joachim;
展开▼
作者单位

展开▼
年度 2012
总页数
原文格式 PDF
正文语种 en
中图分类

相似文献

外文文献
中文文献
专利

1. Improvement of the LR parsing table and its application to grammatical error correction [J] . Shishibori M., Lee SS., Oono M., Information Sciences: An International Journal . 2002,第1a4期

机译：LR解析表的改进及其在语法错误纠正中的应用
2. Vafa spell-checker for detecting spelling, grammatical, and real-word errors of Persian language [J] . Faili Heshaam, Ehsan Nava, Montazery Mortaza, Literary & linguistic computing . 2016,第1期

机译：Vafa拼写检查器可检测波斯语的拼写，语法和实词错误
3. Analyzing Parser Errors to improve parsing accuracy and to inform tree banking decisions [J] . Bhasha Agrawal, Samar Husain Linguistic Issues in Language Technology . 2012,第1期

机译：分析解析器错误以提高解析准确性并告知树库决策
4. Using Parallel Features in Parsing of Machine-Translated Sentences for Correction of Grammatical Errors [C] . Rudolf Rosa, Ondrej Dusek, David Marecek, Sixth Workshop on Syntax, Semantics and Structure in Statistical Translation 2012 . 2012

机译：在解析机器翻译的句子中使用并行功能来纠正语法错误
5. Detecting and Diagnosing Grammatical Errors for Beginning Learners of German: From Learner Corpus Annotation to Constraint Satisfaction Problems [D] . Boyd, Adriane 2012

机译：检测和诊断德语初学者的语法错误：从学习者语料库注释到约束满足问题
6. A probabilistic multi-omics data matching method for detecting sample errors in integrative analysis [O] . Eunjee Lee, Seungyeul Yoo, Wenhui Wang, 2019

机译：综合分析中样本误差检测的概率多组学数据匹配方法
7. Detecting grammatical errors in machine translation output using dependency parsing and treebank querying [O] . Tezcan Arda, Hoste Veronique, Macken Lieve 2016

机译：使用依赖项解析和树库查询检测机器翻译输出中的语法错误

Detecting grammatical errors with treebank-induced, probabilistic parsers

摘要

著录项

相似文献

相关主题

期刊订阅