首页> 外文期刊>Expert Systems with Application >Efficient link-based similarity search in web networks
【24h】

Efficient link-based similarity search in web networks

机译:网络中基于链接的有效相似度搜索

获取原文
获取原文并翻译 | 示例

摘要

Similarity search in web networks, aiming to find entities similar to the given entity, is one of the core tasks in network analysis. With the proliferation of web applications, including web search and recommendation system, SimRank has been a well-known measure for evaluating entity similarity in a network. However, the existing work computes SimRank iteratively over a huge similarity matrix, which is expensive in terms of time and space cost and cannot efficiently support similarity search over large networks. In this paper, we propose a link-based similarity search method, WebSim, towards efficiently finding similar entities in web networks. WebSim defines the similarity between entities as the 2-hop similarity of SimRank. To reduce computation cost, we divide the similarity search process into two stages: off-line stage and on-line stage. In the off-line stage, the 1-hop similarities are computed, and an optimized algorithm is designed to reduce the unnecessary accumulation operations on zero similarities. In the on-line stage, the 2-hop similarities are computed, and a pruning algorithm is developed to support fast query processing through searching similar entries from a partial sums index derived from the 1-hop similarities. The index items that are lower than a given threshold are skipped to reduce the searching space. Compared to the iterative SimRank computation, the time and space cost of similarity computation is significantly reduced, since WebSim maintains only the similarity matrix of 1-hop that is much smaller than that of multi-hop. Experiments through comparison with SimRank and its optimized algorithms demonstrate that WebSim has on average a 99.83% reduction in the time cost and a 92.12% reduction in the space cost of similarity computation, and achieves on average 99.98% NDCG. (C) 2015 Elsevier Ltd. All rights reserved.
机译:Web网络中的相似性搜索旨在查找与给定实体相似的实体,这是网络分析中的核心任务之一。随着Web应用程序(包括Web搜索和推荐系统)的激增,SimRank已成为评估网络中实体相似性的众所周知的方法。但是,现有工作在巨大的相似度矩阵上迭代计算SimRank,这在时间和空间成本方面是昂贵的,并且不能有效地支持大型网络上的相似度搜索。在本文中,我们提出了一种基于链接的相似性搜索方法WebSim,以有效地查找Web网络中的相似实体。 WebSim将实体之间的相似性定义为SimRank的2跳相似性。为了降低计算成本,我们将相似性搜索过程分为两个阶段:离线阶段和在线阶段。在离线阶段,计算1跳相似度,并设计一种优化算法来减少零相似度上不必要的累加操作。在在线阶段,计算2跳相似度,并开发了修剪算法以通过从1跳相似度派生的部分和索引中搜索相似条目来支持快速查询处理。低于给定阈值的索引项将被跳过,以减少搜索空间。与迭代SimRank计算相比,相似度计算的时间和空间成本显着降低,因为WebSim仅保留了比多跳数小的1跳相似度矩阵。通过与SimRank及其优化算法进行比较的实验表明,WebSim的时间成本平均减少了99.83%,相似性计算的空间成本减少了92.12%,平均NDCG达到了99.98%。 (C)2015 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Expert Systems with Application》 |2015年第22下期|8868-8880|共13页
  • 作者单位

    Univ Shanghai Sci & Technol, Shanghai 200093, Peoples R China|Fudan Univ, Shanghai 201203, Peoples R China;

    Fudan Univ, Shanghai 201203, Peoples R China;

    Fudan Univ, Shanghai 201203, Peoples R China;

    Univ Shanghai Sci & Technol, Shanghai 200093, Peoples R China|Fudan Univ, Shanghai 201203, Peoples R China;

    Univ Shanghai Sci & Technol, Shanghai 200093, Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Similarity search; Web network; WebSim;

    机译:相似度搜索;Web网络;WebSim;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号