首页> 外文OA文献 >Leveraging History for Faster Sampling of Online Social Networks
【2h】

Leveraging History for Faster Sampling of Online Social Networks

机译:利用历史来加快在线社交网络的抽样

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

How to enable efficient analytics over such data has been an increasinglyimportant research problem. Given the sheer size of such social networks, manyexisting studies resort to sampling techniques that draw random nodes from anonline social network through its restrictive web/API interface. Almost all ofthem use the exact same underlying technique of random walk - a Markov ChainMonte Carlo based method which iteratively transits from one node to its randomneighbor. Random walk fits naturally with this problem because, for most online socialnetworks, the only query we can issue through the interface is to retrieve theneighbors of a given node (i.e., no access to the full graph topology). Aproblem with random walks, however, is the "burn-in" period which requires alarge number of transitions/queries before the sampling distribution convergesto a stationary value that enables the drawing of samples in a statisticallyvalid manner. In this paper, we consider a novel problem of speeding up the fundamentaldesign of random walks (i.e., reducing the number of queries it requires)without changing the stationary distribution it achieves - thereby enabling amore efficient "drop-in" replacement for existing sampling-based analyticstechniques over online social networks. Our main idea is to leverage thehistory of random walks to construct a higher-ordered Markov chain. We developtwo algorithms, Circulated Neighbors and Groupby Neighbors Random Walk (CNRWand GNRW) and prove that, no matter what the social network topology is, CNRWand GNRW offer better efficiency than baseline random walks while achieving thesame stationary distribution. We demonstrate through extensive experiments onreal-world social networks and synthetic graphs the superiority of ourtechniques over the existing ones.
机译:如何对此类数据进行有效分析已成为越来越重要的研究问题。考虑到此类社交网络的庞大规模,许多现有研究采用了采样技术,该技术通过其限制性Web / API界面从在线社交网络中抽取随机节点。几乎所有它们都使用完全相同的随机游走底层技术-一种基于Markov ChainMonte Carlo的方法,该方法从一个节点迭代过渡到其随机邻居。随机游走很自然地适合了这个问题,因为对于大多数在线社交网络而言,我们可以通过界面发出的唯一查询是检索给定节点的邻居(即,无法访问完整的图拓扑)。但是,随机游走的障碍是“老化”时期,需要大量的转换/查询才能使采样分布收敛到一个固定值,该固定值能够以统计学上有效的方式绘制样本。在本文中,我们考虑了一个新的问题,即在不更改实现的固定分布的情况下加快随机游走的基本设计(即,减少所需的查询数量)的问题-从而可以更有效地“替换”现有采样,基于在线社交网络的分析技术。我们的主要思想是利用随机游走的历史来构建高阶马尔可夫链。我们开发了两种算法:循环邻居和分组邻居随机游走(CNRW和GNRW),并证明,无论社交网络拓扑如何,CNRW和GNRW都比基线随机游走提供了更好的效率,同时实现了相同的平稳分布。我们通过在现实世界中的社交网络上进行广泛的实验和合成图来证明我们的技术相对于现有技术的优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号