首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >A Web Search Engine-Based Approach to Measure Semantic Similarity between Words
【24h】

A Web Search Engine-Based Approach to Measure Semantic Similarity between Words

机译:基于Web搜索引擎的词间语义相似度度量方法

获取原文
获取原文并翻译 | 示例

摘要

Measuring the semantic similarity between words is an important component in various tasks on the web such as relation extraction, community mining, document clustering, and automatic metadata extraction. Despite the usefulness of semantic similarity measures in these applications, accurately measuring semantic similarity between two words (or entities) remains a challenging task. We propose an empirical method to estimate semantic similarity using page counts and text snippets retrieved from a web search engine for two words. Specifically, we define various word co-occurrence measures using page counts and integrate those with lexical patterns extracted from text snippets. To identify the numerous semantic relations that exist between two given words, we propose a novel pattern extraction algorithm and a pattern clustering algorithm. The optimal combination of page counts-based co-occurrence measures and lexical pattern clusters is learned using support vector machines. The proposed method outperforms various baselines and previously proposed web-based semantic similarity measures on three benchmark data sets showing a high correlation with human ratings. Moreover, the proposed method significantly improves the accuracy in a community mining task.
机译:测量单词之间的语义相似性是Web上各种任务(例如关系提取,社区挖掘,文档聚类和自动元数据提取)中的重要组成部分。尽管语义相似性度量在这些应用程序中很有用,但是准确测量两个单词(或实体)之间的语义相似性仍然是一项艰巨的任务。我们提出了一种经验方法,该方法使用从Web搜索引擎中检索到的两个单词的页数和文本片段来估计语义相似性。具体来说,我们使用页数定义各种单词共现度量,并将其与从文本摘录中提取的词汇模式进行整合。为了识别两个给定单词之间存在的大量语义关系,我们提出了一种新颖的模式提取算法和模式聚类算法。使用支持向量机学习基于页数的共现度量和词汇模式簇的最佳组合。所提出的方法在三个基准数据集上优于各种基线和先前提出的基于Web的语义相似性度量,这些度量显示了与人类评级的高度相关性。此外,提出的方法大大提高了社区挖掘任务的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号