Word Clustering and Disambiguation Based on Co-occurrence Data

机译：基于共现数据的词聚类与消歧

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We address the problem of clustering words (or constructing a thesaurus) based on-occurrence data, and using the acquired word classes to improve the accuracy of syntactic disambiguation. We view this problem as that of estimating a joint probability distribution specifying the joint probabilities of word pairs, such as noun verb pairs. We propose an efficient algorithm based on the Minimum Descrption Length (MDL) principle for estimating such a probability distribution. Our method is a natural extension of those proposed in (Brown et al., 1992) and (Li and Abe, 1996), and overcomes their drawbacks while retaining their advantages. We then combined this clustering method with the disambiguation method of (Lia and Abe, 1995) to derive a disambiguation method that makes use of both automatically constructed thesauruses and a hand-made thesaurus. the overal disambiguation accuracy achieved by our method is 85.2percent, which compares favorably against the accuracy (82.4percent) obtained by the state-of-the-art disambiguation method of (Brill and Resnnik, 1994).

机译：我们解决了基于出现数据对单词进行聚类（或构建同义词库）的问题，并使用获取的单词类来提高句法歧义消除的准确性。我们将此问题视为估计指定单词对（例如名词动词对）的联合概率的联合概率分布的问题。我们提出了一种基于最小描述长度（MDL）原理的有效算法，用于估算这种概率分布。我们的方法是（Brown et al。，1992）和（Li and Abe，1996）中提出的方法的自然扩展，并克服了它们的缺点，同时保留了它们的优点。然后，我们将此聚类方法与（Lia and Abe，1995）的消歧方法相结合，得出一种消歧方法，该方法同时使用自动构建的同义词库和手工同义词库。我们的方法实现的总体消歧准确度为85.2％，与通过最新的消歧方法（Brill和Resnnik，1994）获得的准确度（82.4％）相比。

著录项

来源
《Annual meeting of the association for computational linguistics;International conference on computational linguistics;ICCL 》|1998年|p.749-755|共7页
会议地点 Montreal(CA);Montreal(CA);Montreal(CA)
作者
Hang Li; Naoki Abe;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Co-occurrence graphs for word sense disambiguation in the biomedical domain [J] . Duque Andres, Stevenson Mark, Martinez-Romo Juan, Artificial intelligence in medicine . 2018 ,第MAY期

机译：生物医学领域中词义歧义消除的共现图
2. Latent Semantic Word Sense Disambiguation Using Global Co-Occurrence Information [J] . Minoru Sasaki Computer Science & Information Technology . 2014 ,第2期

机译：使用全局共同信息信息潜在语义词义歧义
3. Privacy Preserving Multiview Point Based BAT Clustering Algorithm and Graph Kernel Method for Data Disambiguation on Horizontally Partitioned Data [J] . J. Anitha, R. Rangarajan Research journal of applied science, engineering and technology . 2015 ,第6期

机译：基于隐私保护的多视点BAT聚类算法和图核方法用于水平分割数据的数据消歧
4. Word Clustering and Disambiguation Based on Co-occurrence Data [C] . Hang Li, Naoki Abe Annual meeting of the association for computational linguistics . 1998

机译：基于共发生数据的词聚类和消歧
5. Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding [D] . Ju, Yiting. 2017

机译：事物和字符串和更多：通过组合实体共同发生，主题建模和单词嵌入来改善从短文本的歧义
6. Fast max-margin clustering for unsupervised word sense disambiguation in biomedical texts [O] . Weisi Duan, Min Song, Alexander Yates 2009

机译：快速最大边距聚类用于生物医学文本中无监督的词义消歧
7. Word Clustering and Disambiguation Based on Co-occurrence Data [O] . Hang Li, Naoki Abe 1998

机译：基于共现数据的词聚类与消歧

Word Clustering and Disambiguation Based on Co-occurrence Data

摘要

著录项

相似文献

相关主题

期刊订阅