首页> 外文期刊>Concurrency, practice and experience >Measuring semantic similarity between words by removing noise and redundancy in web snippets
【24h】

Measuring semantic similarity between words by removing noise and redundancy in web snippets

机译:通过消除网页摘要中的噪声和冗余来测量单词之间的语义相似性

获取原文
获取原文并翻译 | 示例

摘要

Semantic similarity measures play important roles in many Web-related tasks such as Web browsing and query suggestion. Because taxonomy-based methods can not deal with continually emerging words, recently Web-based methods have been proposed to solve this problem. Because of the noise and redundancy hidden in the Web data, robustness and accuracy are still challenges. In this paper, we propose a method integrating page counts and snippets returned by Web search engines. Then, the semantic snippets and the number of search results are used to remove noise and redundancy in the Web snippets ('Web-snippet' includes the title, summary, and URL of a Web page returned by a search engine). After that, a method integrating page counts, semantics snippets, and the number of already displayed search results are proposed. The proposed method does not need any human annotated knowledge (e.g., ontologies), and can be applied Web-related tasks (e.g., query suggestion) easily. A correlation coefficient of 0.851 against Rubenstein-Goodenough benchmark dataset shows that the proposed method outperforms the existing Web-based methods by a wide margin. Moreover, the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods.
机译:语义相似性度量在许多与Web相关的任务(例如Web浏览和查询建议)中起着重要作用。由于基于分类法的方法无法处理不断出现的单词,因此最近提出了基于Web的方法来解决此问题。由于Web数据中隐藏的噪声和冗余,鲁棒性和准确性仍然是挑战。在本文中,我们提出了一种整合Web搜索引擎返回的页数和摘要的方法。然后,使用语义片段和搜索结果的数量来消除Web片段中的噪音和冗余(“ Web片段”包括搜索引擎返回的网页的标题,摘要和URL)。此后,提出了一种集成页数,语义摘要和已经显示的搜索结果数量的方法。所提出的方法不需要任何人工注释的知识(例如,本体论),并且可以容易地应用与网络相关的任务(例如,查询建议)。与Rubenstein-Goodenough基准数据集的相关系数为0.851,表明该方法在很大程度上优于现有的基于Web的方法。此外,针对某些基于页数的方法,所提出的语义相似性度量显着提高了查询建议的质量。

著录项

  • 来源
    《Concurrency, practice and experience》 |2011年第18期|p.2496-2510|共15页
  • 作者单位

    School of Computer Engineering and Science, High Performance Computing Center, Shanghai University, Shanghai 200072, China;

    School of Computer Engineering and Science, High Performance Computing Center, Shanghai University, Shanghai 200072, China;

    School of Computer Engineering and Science, High Performance Computing Center, Shanghai University, Shanghai 200072, China;

    School of Computer Engineering and Science, High Performance Computing Center, Shanghai University, Shanghai 200072, China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    semantic similarity; information retrieval; query suggestion; web search;

    机译:语义相似度;信息检索;查询建议;网络搜索;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号