...
首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis
【24h】

All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis

机译:再谈全距离草图:用于大规模图形分析的HIP估计器

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Graph datasets with billions of edges, such as social and web graphs, are prevalent, and scalability is critical. All-distances sketches (ADS) [Cohen 1997], are a powerful tool for scalable approximation of statistics. The sketch is a small size sample of the distance relation of a node which emphasizes closer nodes. Sketches for all nodes are computed using a nearly linear computation and estimators are applied to sketches of nodes to estimate their properties. We provide, for the first time, a unified exposition of ADS algorithms and applications. We present the historic inverse probability (HIP) estimators which are applied to the ADS of a node to estimate a large natural class of statistics. For the important special cases of neighborhood cardinalities (the number of nodes within some query distance) and closeness centralities, HIP estimators have at most half the variance of previous estimators and we show that this is essentially optimal. Moreover, HIP obtains a polynomial improvement for more general statistics and the estimators are simple, flexible, unbiased, and elegant. For approximate distinct counting on data streams, HIP outperforms the original estimators for the HyperLogLog MinHash sketches (Flajolet et al. 2007), obtaining significantly improved estimation quality for this state-of-the-art practical algorithm.
机译:具有数十亿条边的图形数据集(例如社交图和Web图)很普遍,并且可伸缩性至关重要。全距离草图(ADS)[Cohen 1997]是用于统计的可伸缩近似的强大工具。草图是强调距离较近的节点的距离关系的小样本。使用几乎线性的计算来计算所有节点的草图,并将估计器应用于节点的草图以估计其属性。我们首次提供ADS算法和应用程序的统一展示。我们介绍了历史逆概率(HIP)估计量,该估计量适用于节点的ADS来估计大量的自然统计量。对于邻域基数(在某个查询距离内的节点数)和紧密度中心的重要特殊情况,HIP估计量最多具有先前估计量方差的一半,我们证明这基本上是最优的。此外,HIP获得了多项式改进,可用于更通用的统计数据,并且估计量简单,灵活,无偏且优雅。为了对数据流进行近似的不同计数,HIP优于HyperLogLog MinHash草图的原始估计量(Flajolet et al.2007),从而大大提高了这种最新实用算法的估计质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号