首页> 外文会议>International Conference on Computational Linguistics and Intelligent Text Processing >A Comparison of Co-occurrence and Similarity Measures as Simulations of Context
【24h】

A Comparison of Co-occurrence and Similarity Measures as Simulations of Context

机译:与语境模拟的共同发生和相似度措施的比较

获取原文

摘要

Observations of word co-occurrences and similarity computations are often used as a straightforward way to represent the global contexts of words and achieve a simulation of semantic word similarity for applications such as word or document clustering and collocation extraction. Despite the simplicity of the underlying model, it is necessary to select a proper significance, a similarity measure and a similarity computation algorithm. However, it is often unclear how the measures are related and additionally often dimensionality reduction is applied to enable the efficient computation of the word similarity. This work presents a linear time complexity approximative algorithm for computing word similarity without any dimensionality reduction. It then introduces a large-scale evaluation based on two languages and two knowledge sources and discusses the underlying reasons for the relative performance of each measure.
机译:单词共同发生和相似性计算的观察通常用作表示单词的全局背景的直接方式,并实现诸如单词或文档聚类和搭配提取的应用的语义词相似度的模拟。尽管底层模型的简单性,但是必须选择适当的意义,相似度测量和相似性计算算法。然而,通常不清楚措施如何相关且否则通常会应用维度减少,以便能够有效地计算单词相似度。该工作提出了一种用于计算字相似性的线性时间复杂度近似算法,而无需任何维度降低。然后,基于两种语言和两个知识来源介绍了大规模评估,并讨论了每个措施相对性能的基本原因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号