【24h】

Characterizing Twitter with Respondent-Driven Sampling

机译:通过响应者驱动的采样表征Twitter

获取原文

摘要

Twitter as one of the most important microblogging online social networks has attracted more than 200 million users in recent years. Although there have been several attempts on characterizing the Twitter by using incomplete sampled data, they have not been very successful to estimate the characteristics of the whole network. In this paper, we characterize Twitter by sampling from its social graph and user behaviors through a random walk based sampling technique called Respondent-Driven Sampling (RDS). To the best of our knowledge, for the first time RDS method and its estimator are used in order to obtain uniform unbiased estimation of several key structural and behavioral properties of Twitter. We compared the performance of the proposed method with other sampling methods such as Metropolis-Hasting Random Walk (MHRW) and sampling from active users (Timeline) against the uniform sampling (UNI). In order to gather the required data, we have implemented four independent crawlers. Our experimental results indicate that the RDS method exhibits lower estimation errors to the sample in- and out-degree distribution compared to MHRW and Timeline. We also show that RDS is more suitable to sample the followers vs. followings ratio, and the correlation between followers/followings vs. tweets.
机译:Twitter作为最重要的微博在线社交网络之一,近年来已吸引了超过2亿用户。尽管已尝试使用不完整的采样数据对Twitter进行特征化,但他们在评估整个网络的特征方面还不是很成功。在本文中,我们通过Twitter的社交图谱和用户行为,通过基于随机游动的抽样技术(称为响应者驱动的抽样(RDS))对Twitter进行特征化。据我们所知,这是第一次使用RDS方法及其估计器,以便对Twitter的几个关键结构和行为特性进行统一的无偏估计。我们将建议的方法的性能与其他采样方法(例如都市圈随机游走(MHRW)和活动用户的采样(时间轴)和统一采样(UNI))进行了比较。为了收集所需的数据,我们实现了四个独立的搜寻器。我们的实验结果表明,与MHRW和时间轴相比,RDS方法对样本的进度和出度分布显示出较低的估计误差。我们还表明,RDS更适合对关注者与关注者比率以及关注者/关注者与推文之间的相关性进行抽样。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号