首页> 外文期刊>Information Sciences: An International Journal >HeteRank: A general similarity measure in heterogeneous information networks by integrating multi-type relationships
【24h】

HeteRank: A general similarity measure in heterogeneous information networks by integrating multi-type relationships

机译:杂:通过集成多型关系,通过整合多型信息网络的一般相似度测量

获取原文
获取原文并翻译 | 示例
           

摘要

With heterogeneous information networks becoming ubiquitous and complex, lots of data mining tasks have been explored, including clustering, collaborative filtering and link prediction. Similarity computation is a fundamental task required for many problems of data mining. Although a large amount of similarity measures are developed for assessing similarities in heterogeneous networks, they are usually dependent on the network schema and lack a general manner for integrating kinds of relationships between objects. In this paper, we propose a similarity measure, namely HeteRank, for generally computing similarities in heterogeneous information networks. The relationships between different type objects are represented by a general relationship matrix (GRM) that is built based on the scales of different type objects. Based on GRM, HeteRank fully integrates the multi-type relationships into similarity computation by utilizing all the meetings between objects. The HeteRank equation is further transformed into a simple binomial expression form with considering restart probability. For efficiently computing HeteRank similarities, we divide the similarity computation into two steps: the first step is to compute the intermediate values, and the second step is to compute the similarities based on intermediate values. And then we approximate HeteRank equation by setting thresholds for skipping lower intermediate values and similarity scores. A pruning algorithm is developed to reduce the unnecessary visits, multiplications and additions that make little contribution during similarity computation. Extensive experiments on real datasets demonstrate the effectiveness and efficiency of HeteRank through comparing with the state-of-the-art similarity measures. (C) 2018 Elsevier Inc. All rights reserved.
机译:由于异构信息网络变得无处不在,并且已经探索了许多数据挖掘任务,包括聚类,协作滤波和链路预测。相似性计算是数据挖掘许多问题所需的基本任务。尽管用于评估异构网络中的相似性的大量相似度措施,但它们通常依赖于网络模式,并且缺乏用于集成对象之间的关系的一般方式。在本文中,我们提出了一种相似度测量,即HELEND,用于异构信息网络中的通常计算相似之处。不同类型对象之间的关系由基于不同类型对象的尺度构建的一般关系矩阵(GRM)表示。基于GRM,HELEND通过利用对象之间的所有会议完全将多型关系集成到相似性计算中。通过考虑重启概率进一步转化为简单的二项式表达形式。为了有效地计算单词相似性,我们将相似性计算分为两个步骤:第一步是计算中间值,第二步骤是基于中间值计算相似度。然后,我们通过设置跳过较低的中间值和相似性分数的阈值来近似单迹方程。开发了一种修剪算法,以减少在相似性计算期间没有贡献的不必要的访问,乘法和添加。关于实际数据集的广泛实验证明了单南的效率和效率,通过与最先进的相似性措施相比。 (c)2018年Elsevier Inc.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号