首页> 外文会议>International conference on intelligent text processing and computational linguistics >Lemon and Tea Are Not Similar: Measuring Word-to-Word Similarity by Combining Different Methods
【24h】

Lemon and Tea Are Not Similar: Measuring Word-to-Word Similarity by Combining Different Methods

机译:柠檬和茶不相似:通过组合不同的方法来衡量单词间的相似度

获取原文

摘要

Substantial amount of work has been done on measuring word-to-word relatedness which is also commonly referred as similarity. Though relatedness and similarity are closely related, they are not the same as illustrated by the words lemon and tea which are related but not similar. The relatedness takes into account a broader ranLemge of relations while similarity only considers subsumption relations to assess how two objects are similar. We present in this paper a method for measuring the semantic similarity of words as a combination of various techniques including knowledge-based and corpus-based methods that capture different aspects of similarity. Our corpus based method exploits state-of-the-art word representations. We performed experiments with a recendy published significantly large dataset called Simlex-999 and achieved a significantly better correlation (ρ = 0.642, P < 0.001) with human judgment compared to the individual performance.
机译:在测量词与词之间的相关性方面已经完成了大量工作,这通常也被称为相似性。尽管相关性和相似性密切相关,但它们与“柠檬”和“茶”这两个相关但不相似的词并不相同。关联性考虑了更广泛的关系范围,而相似性仅考虑包含关系来评估两个对象的相似性。我们在本文中介绍了一种用于测量单词语义相似性的方法,该方法是多种技术的组合,包括捕获相似性不同方面的基于知识的方法和基于语料库的方法。我们基于语料库的方法利用了最先进的单词表示形式。我们使用大量公开发表的名为Simlex-999的大型数据集进行了实验,与个人表现相比,与人的判断具有显着更好的相关性(ρ= 0.642,P <0.001)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号