Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

机译：使用大型单明性和双语语料库来改善协调消济歧义

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Resolving coordination ambiguity is a classic hard problem. This paper looks at coordination disambiguation in complex noun phrases (NPs). Parsers trained on the Perm Treebank are reporting impressive numbers these days, but they don't do very well on this problem (79%). We explore systems trained using three types of corpora: (1) annotated (e.g. the Penn Treebank), (2) bitexts (e.g. Eu-roparl), and (3) unannotated monolingual (e.g. Google N-grams). Size matters: (1) is a million words, (2) is potentially billions of words and (3) is potentially trillions of words. The unannotated monolingual data is helpful when the ambiguity can be resolved through associations among the lexical items. The bilingual data is helpful when the ambiguity can be resolved by the order of words in the translation. We train separate classifiers with monolingual and bilingual features and iteratively improve them via co-training. The co-trained classifier achieves close to 96% accuracy on Treebank data and makes 20% fewer errors than a supervised system trained with Treebank annotations.

机译：解决协调歧义是一个经典的难题。本文在复杂的名词短语（NPS）中看待协调歧义。 PERM TREEBANK培训的解析器这些天正在报告令人印象深刻的数字，但他们对这个问题没有很好做得很好（79％）。我们探讨使用三种类型的语料库训练系统：（1）注释（例如，宾州树库），（2）bitexts（例如铕roparl），和（3）未注释的单语（例如谷歌的n-gram）。尺寸问题：（1）是一百万字，（2）可能是数十亿的单词，（3）是占星数量的单词。当通过词汇项之间的关联解决歧义时，未经发布的单格式数据是有用的。当通过翻译中的单词顺序解决歧义，双语数据是有用的。我们培训单独的分类器，并通过联合培训来策略改善它们。共同训练的分类器在TreeBank数据上实现了接近96％的精度，并且比使用TreeBank注释培训的监督系统误差20％。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2011年||共10页
会议地点
作者
Shane Bergsma; David Yarowsky; Kenneth Church;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Mining monolingual and bilingual corpora [J] . Chiraz Latiri, Kamel Smaili, Caroline Lavecchia, Intelligent data analysis . 2010,第6期

机译：挖掘单语和双语语料库
2. Unsupervised Word-Sense Disambiguation Using Bilingual Comparable Corpora [J] . Hiroyuki KAJI, Yasutsugu MORIMOTO IEICE Transactions on Information and Systems . 2005,第2期

机译：使用双语可比语料库的无监督词义消歧
3. Improving comprehension online: effects of deep vocabulary instruction with bilingual and monolingual fifth graders [J] . C. Patrick Proctor, Bridget Dalton, Paola Uccelli, Reading and Writing . 2011,第5期

机译：在线提高理解能力：双语和单语五年级学生的深层词汇教学效果
4. Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation [C] . Shane Bergsma, David Yarowsky, Kenneth Church Annual meeting of the Association for Computational Linguistics;ACL 2011 . 2012

机译：使用大型单语和双语语料库改善协调度消歧
5. An Exploratory Study of Young Bilingual and Monolingual Children's Play in a Naturalistic Setting an Exploratory Study of Young Bilingual and Monolingual Children's Play in a Naturalistic Setting [D] . Tennant, Rachel Ruah. 2019

机译：浅谈年轻双语和单晶儿童在自然主义中的探索性研究，在自然主义环境中对年轻双语和单梅林儿童游戏的探索性研究
6. Coordination of Executive Functions in Monolingual and Bilingual Children [O] . Ellen Bialystok -1

机译：协调单机和双语儿童的执行职能
7. An improved method for finding bilingual collocation correspondences from monolingual corpora [O] . Xu R, Wong KF, Lu Q, 2006

机译：一种从单语语料库查找双语搭配对应关系的改进方法
8. Extracting Structural Paraphrases from Aligned Monolingual Corpora [R] . Ibrahim, A. , Katz, B. , Lin, J. 2003

机译：从对齐单语语料库中提取结构释义

Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation

摘要

著录项

相似文献

相关主题

期刊订阅