...
首页> 外文期刊>Information Systems Frontiers >Semantic similarity measurement using historical google search patterns
【24h】

Semantic similarity measurement using historical google search patterns

机译:使用历史Google搜索模式进行语义相似度测量

获取原文
获取原文并翻译 | 示例
           

摘要

Computing the semantic similarity between terms (or short text expressions) that have the same meaning but which are not lexicographically similar is an important challenge in the information integration field. The problem is that techniques for textual semantic similarity measurement often fail to deal with words not covered by synonym dictionaries. In this paper, we try to solve this problem by determining the semantic similarity for terms using the knowledge inherent in the search history logs from the Google search engine. To do this, we have designed and evaluated four algorithmic methods for measuring the semantic similarity between terms using their associated history search patterns. These algorithmic methods are: a) frequent co-occurrence of terms in search patterns, b) computation of the relationship between search patterns, c) outlier coincidence on search patterns, and d) forecasting comparisons. We have shown experimentally that some of these methods correlate well with respect to human judgment when evaluating general purpose benchmark datasets, and significantly outperform existing methods when evaluating datasets containing terms that do not usually appear in dictionaries.
机译:在信息集成领域,计算具有相同含义但在字典上不相似的术语(或短文本表达)之间的语义相似性是一项重要的挑战。问题在于,用于文本语义相似性度量的技术通常无法处理同义词词典未涵盖的单词。在本文中,我们尝试通过使用Google搜索引擎的搜索历史日志中固有的知识确定术语的语义相似性来解决此问题。为此,我们设计并评估了四种算法方法,用于使用相关的历史搜索模式来测量术语之间的语义相似性。这些算法方法是:a)搜索模式中词语的频繁共现; b)搜索模式之间关系的计算; c)搜索模式中的异常一致;以及d)预测比较。我们已经通过实验表明,其中一些方法在评估通用基准数据集时与人类判断相关性很​​好,并且在评估包含通常不出现在词典中的术语的数据集时,其性能明显优于现有方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号