Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

机译：使用大型单语和双语语料库改善协调度消歧

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Resolving coordination ambiguity is a classic hard problem. This paper looks at coordination disambiguation in complex noun phrases (NPs). Parsers trained on the Perm Treebank are reporting impressive numbers these days, but they don't do very well on this problem (79%). We explore systems trained using three types of corpora: (1) annotated (e.g. the Penn Treebank), (2) bitexts (e.g. Eu-roparl), and (3) unannotated monolingual (e.g. Google N-grams). Size matters: (1) is a million words, (2) is potentially billions of words and (3) is potentially trillions of words. The unannotated monolingual data is helpful when the ambiguity can be resolved through associations among the lexical items. The bilingual data is helpful when the ambiguity can be resolved by the order of words in the translation. We train separate classifiers with monolingual and bilingual features and iteratively improve them via co-training. The co-trained classifier achieves close to 96% accuracy on Treebank data and makes 20% fewer errors than a supervised system trained with Treebank annotations.

机译：解决协调歧义是一个经典的难题。本文着眼于复杂名词短语（NPs）中的协调消歧。如今，经过彼尔姆树银行培训的解析器的报告数量令人印象深刻，但他们在此问题上的表现并不理想（79％）。我们探索使用三种语料库训练的系统：（1）带注释的（例如Penn Treebank），（2）二进制扩展名（例如Eu-roparl）和（3）无注释的单语种（例如Google N-gram）。大小很重要：（1）是一百万个单词，（2）可能是数十亿个单词，（3）可能是数万亿个单词。当可以通过词汇项之间的关联来解决歧义时，未注释的单语数据会很有帮助。当可以通过翻译中的单词顺序解决歧义时，双语数据会很有帮助。我们训练具有单语和双语功能的单独分类器，并通过共同训练来迭代地改进它们。与使用Treebank注释训练的监督系统相比，共同训练的分类器在Treebank数据上的准确性接近96％，并且错误减少了20％。

著录项

来源
《Annual meeting of the Association for Computational Linguistics;ACL 2011》|2012年|p.1346-1355|共10页
会议地点
作者
Shane Bergsma; David Yarowsky; Kenneth Church;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Mining monolingual and bilingual corpora [J] . Chiraz Latiri, Kamel Smaili, Caroline Lavecchia, Intelligent data analysis . 2010,第6期

机译：挖掘单语和双语语料库
2. Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora [J] . Hiroyuki KAJI, Yasutsugu MORIMOTO IEICE Transactions on Information and Systems . 2005,第2期

机译：使用双语可比语料库的无监督词义消歧
3. Improving comprehension online: effects of deep vocabulary instruction with bilingual and monolingual fifth graders [J] . C. Patrick Proctor, Bridget Dalton, Paola Uccelli, Reading and Writing . 2011,第5期

机译：在线提高理解能力：双语和单语五年级学生的深层词汇教学效果
4. Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation [C] . Shane Bergsma, David Yarowsky, Kenneth Church Annual meeting of the Association for Computational Linguistics . 2011

机译：使用大型单明性和双语语料库来改善协调消济歧义
5. An Exploratory Study of Young Bilingual and Monolingual Children's Play in a Naturalistic Setting an Exploratory Study of Young Bilingual and Monolingual Children's Play in a Naturalistic Setting [D] . Tennant, Rachel Ruah. 2019

机译：浅谈年轻双语和单晶儿童在自然主义中的探索性研究，在自然主义环境中对年轻双语和单梅林儿童游戏的探索性研究
6. Coordination of Executive Functions in Monolingual and Bilingual Children [O] . Ellen Bialystok -1

机译：协调单机和双语儿童的执行职能
7. An improved method for finding bilingual collocation correspondences from monolingual corpora [O] . Xu R, Wong KF, Lu Q, 2006

机译：一种从单语语料库查找双语搭配对应关系的改进方法
8. Extracting Structural Paraphrases from Aligned Monolingual Corpora [R] . Ibrahim, A. , Katz, B. , Lin, J. 2003

机译：从对齐单语语料库中提取结构释义

Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

摘要

著录项

相似文献

相关主题

期刊订阅