Co-occurrence retrieval: A flexible framework for lexical distributional similarity

Weeds J; Weir D

首页> 外文期刊>Computational linguistics >Co-occurrence retrieval: A flexible framework for lexical distributional similarity

【24h】

Co-occurrence retrieval: A flexible framework for lexical distributional similarity

机译：共现检索：用于词汇分布相似性的灵活框架

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Techniques that exploit knowledge of distributional similarity between words have been proposed in many areas of Natural Language Processing. For example, in language modeling, the sparse data problem can be alleviated by estimating the probabilities of unseen co-occurrences of events from the probabilities of seen co-occurrences of similar events. In other applications, distributional similarity is taken to be an approximation to semantic similarity. However, due to the wide range of potential applications and the lack of a strict definition of the concept of distributional similarity, many methods of calculating distributional similarity have been proposed or adopted. In this work, a flexible, parameterized framework for calculating distributional similarity is proposed. Within this framework, the problem of finding distributionally similar words is cast as one of co-occurrence retrieval (CR) for which precision and recall can be measured by analogy with the way they are measured in document retrieval. As will be shown, a number of popular existing measures of distributional similarity are simulated with parameter settings within the CR framework. In this article, the CR framework is then used to systematically investigate three fundamental questions concerning distributional similarity. First, is the relationship of lexical similarity necessarily symmetric, or are there advantages to be gained from considering it as an asymmetric relationship? Second, are some co-occurrences inherently more salient than others in the calculation of distributional similarity? Third, is it necessary to consider the difference in the extent to which each word occurs in each co-occurrence type? Two application-based tasks are used for evaluation: automatic thesaurus generation and pseudo-disambiguation. It is possible to achieve significantly better results on both these tasks by varying the parameters within the CR framework rather than using other existing distributional similarity measures; it will also be shown that any single unparameterized measure is unlikely to be able to do better on both tasks. This is due to an inherent asymmetry in lexical substitutability and therefore also in lexical distributional similarity.

机译：在自然语言处理的许多领域中已经提出了利用单词之间的分布相似性知识的技术。例如，在语言建模中，可以通过从类似事件的已见共现概率中估计事件中未见共现的概率来缓解稀疏数据问题。在其他应用程序中，分布相似性被认为是语义相似性的近似。但是，由于潜在的应用范围很广，并且缺乏对分布相似性概念的严格定义，已经提出或采用了许多计算分布相似性的方法。在这项工作中，提出了一种用于计算分布相似度的灵活，参数化的框架。在此框架内，发现分布相似的单词的问题被视为共现检索（CR）之一，对于共现检索（CR），可以通过类似于在文档检索中对其进行测量的方式来测量其准确性和召回率。如将显示的那样，使用CR框架内的参数设置来模拟许多流行的现有分布相似性度量。然后在本文中，使用CR框架系统地研究有关分布相似性的三个基本问题。首先，词汇相似性关系一定是对称的，还是将其视为非对称关系会获得好处？第二，在分布相似度的计算中，某些共生固有地比其他共生显着吗？第三，是否需要考虑每种共现类型中每个单词出现的程度不同？有两个基于应用程序的任务用于评估：自动同义词库生成和伪歧义消除。通过在CR框架内更改参数而不是使用其他现有的分布相似性度量，可以在这两项任务上取得明显更好的结果。还将显示，任何单个非参数化度量都不太可能在两个任务上都做得更好。这是由于词汇可替换性和词汇分布相似性中固有的不对称性。

著录项

来源
《Computational linguistics》 |2005年第4期|共37页
作者
Weeds J; Weir D;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
入库时间 2022-08-18 09:40:05

相似文献

外文文献
中文文献
专利

1. Co-occurrence retrieval: A flexible framework for lexical distributional similarity [J] . Weeds J, Weir D Computational linguistics . 2005,第4期

机译：共现检索：用于词汇分布相似性的灵活框架
2. Lexical Co-Occurrence and Contextual Window-Based Approach with Semantic Similarity for Query Expansion [J] . Jagendra Singh, Rakesh Kumar International Journal of Intelligent Information Technologies . 2017,第3期

机译：基于词汇的共同发生和基于语境窗口的方法，具有查询扩展的语义相似性
3. Lexicality and phonological similarity: A challenge for the retrieval-based account of serial recall? [J] . Anthony B. Fallon, Eva Mak, Gerald Tehan, Memory . 2005,第3a4期

机译：词汇性和语音相似性：对基于检索的连续回忆的挑战吗？
4. Robust Co-occurrence Quantification for Lexical Distributional Semantics [C] . Dmitrijs Milajevs, Mehrnoosh Sadrzadeh, Matthew Purver Annual meeting of the Association for Computational Linguistics . 2016

机译：词汇分布语义的鲁棒共现量化
5. Multilingual distributional lexical similarity. [D] . Baker, Kirk. 2008

机译：多语言分布词汇相似度。
6. Correction: Lexical category acquisition is facilitated by uncertainty in distributional co-occurrences [O] . 2012

机译：更正：分布共现的不确定性促进了词汇类别的获取
7. Co-occurrence retrieval: A flexible framework for lexical distributional similarity [O] . Julie Weeds, David Weir 2005

机译：共现检索：词汇分布相似性的灵活框架

Co-occurrence retrieval: A flexible framework for lexical distributional similarity

摘要

著录项

相似文献

相关主题

期刊订阅