首页> 外文会议>Textual inference and structures in corpora >Alternative measures of word relatedness in distributional semantics
【24h】

Alternative measures of word relatedness in distributional semantics

机译:分布语义学中词语相关性的替代度量

获取原文
获取原文并翻译 | 示例

摘要

This paper presents an alternative method to measuring word-word semantic relatedness in distributional semantics framework. The main idea is to represent target words as rankings of all co-occurring words in a text corpus, ordered by their tf- idf weight and use a metric between rankings (such as Jaro distance or Rank distance) to compute semantic relatedness. This method has several advantages over the standard approach that uses cosine measure in a vector space, mainly in that it is computationally less expensive (i.e. does not require working in a high dimensional space, employing only rankings and a distance which is linear in the rank's length) and presumably more robust. We tested this method on the standard WS-353 Test, obtaining the co-occurrence frequency from the Wacky corpus. The results are comparable to the methods which use vector space models; and, most importantly, the method can be extended to the very challenging task of measuring phrase semantic relatedness.
机译:本文提出了一种在分布语义框架中测量词-词语义相关性的替代方法。主要思想是将目标词表示为文本语料库中所有同时出现的词的排名,并按其tf-idf权重排序,并使用排名之间的度量(例如Jaro距离或Rank距离)来计算语义相关性。与在向量空间中使用余弦测量的标准方法相比,此方法具有几个优点,主要是因为它在计算上更便宜(即,不需要在高维空间中工作,仅使用等级和在等级中线性的距离)长度),并且可能更健壮。我们在标准WS-353测试中测试了该方法,并从Wacky语料库中获得了共现频率。结果与使用向量空间模型的方法相当。并且最重要的是,该方法可以扩展到测量短语语义相关性这一非常具有挑战性的任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号