首页> 外文期刊>ACM transactions on the web >Sampling Content from Online Social Networks: Comparing Random vs. Expert Sampling of the Twitter Stream
【24h】

Sampling Content from Online Social Networks: Comparing Random vs. Expert Sampling of the Twitter Stream

机译:从在线社交网络中采样内容:比较Twitter流的随机采样与专家采样

获取原文
获取原文并翻译 | 示例

摘要

Analysis of content streams gathered from social networking sites such as Twitter has several applications ranging from content search and recommendation, news detection to business analytics. However, processing large amounts of data generated on these sites in real-time poses a difficult challenge. To cope with the data deluge, analytics companies and researchers are increasingly resorting to sampling. In this article, we investigate the crucial question of how to sample content streams generated by users in online social networks. The traditional method is to randomly sample all the data. For example, most studies using Twitter data today rely on the 1% and 10% randomly sampled streams of tweets that are provided by Twitter. In this paper, we analyze a different sampling methodology, one where content is gathered only from a relatively small sample (<1%) of the user population, namely, the expert users. Over the duration of a month, we gathered tweets from over 500,000 Twitter users who are identified as experts on a diverse set of topics, and compared the resulting expert sampled tweets with the 1% randomly sampled tweets provided publicly by Twitter. We compared the sampled datasets along several dimensions, including the popularity, topical diversity, trustworthiness, and timeliness of the information contained within them, and on the sentiment/opinion expressed on specific topics. Our analysis reveals several important differences in data obtained through the different sampling methodologies, which have serious implications for applications such as topical search, trustworthy content recommendations, breaking news detection, and opinion mining.
机译:从诸如Twitter之类的社交网站收集的内容流的分析具有从内容搜索和推荐,新闻检测到业务分析的多种应用程序。但是,实时处理在这些站点上生成的大量数据带来了艰巨的挑战。为了应对数据泛滥,分析公司和研究人员越来越多地采用抽样方法。在本文中,我们研究了至关重要的问题,即如何对在线社交网络中用户生成的内容流进行采样。传统方法是随机采样所有数据。例如,当今大多数使用Twitter数据的研究都依赖于Twitter提供的1%和10%随机采样的推文流。在本文中,我们分析了一种不同的抽样方法,即仅从相对较小的样本(<1%)用户群体(即专家用户)中收集内容。在一个月的时间里,我们从超过500,000名Twitter用户中收集了推文,这些用户被确定为各种主题的专家,并将所得的专家采样推文与Twitter公开提供的1%随机采样推文进行了比较。我们在多个维度上比较了采样数据集,包括其中包含的信息的受欢迎程度,主题多样性,可信赖性和及时性,以及在特定主题上表达的观点/观点。我们的分析揭示了通过不同抽样方法获得的数据中的几个重要差异,这些差异对应用程序具有重要意义,例如主题搜索,可信赖的内容推荐,突发新闻检测和观点挖掘。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号