首页> 外文会议>Computational Linguistics and Intelligent Text Processing >A Comparison of Co-occurrence and Similarity Measures as Simulations of Context
【24h】

A Comparison of Co-occurrence and Similarity Measures as Simulations of Context

机译:共现和相似度量作为上下文模拟的比较

获取原文
获取原文并翻译 | 示例

摘要

Observations of word co-occurrences and similarity computations are often used as a straightforward way to represent the global contexts of words and achieve a simulation of semantic word similarity for applications such as word or document clustering and collocation extraction. Despite the simplicity of the underlying model, it is necessary to select a proper significance, a similarity measure and a similarity computation algorithm. However, it is often unclear how the measures are related and additionally often dimensionality reduction is applied to enable the efficient computation of the word similarity. This work presents a linear time complexity approximative algorithm for computing word similarity without any dimensionality reduction. It then introduces a large-scale evaluation based on two languages and two knowledge sources and discusses the underlying reasons for the relative performance of each measure.
机译:单词共现和相似性计算的观察通常用作表示单词全局上下文并为诸如单词或文档聚类和搭配提取之类的应用程序实现语义单词相似性模拟的直接方法。尽管基础模型很简单,但仍然需要选择适当的重要性,相似性度量和相似性计算算法。然而,常常不清楚这些度量之间如何相关,并且另外经常应用降维以实现单词相似度的有效计算。这项工作提出了一种线性时间复杂度近似算法,用于在不降低维数的情况下计算单词相似度。然后介绍了基于两种语言和两种知识来源的大规模评估,并讨论了每种方法相对性能的潜在原因。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号