【24h】

Finding User Clusters in Sina Microblog

机译:在新浪微博中查找用户群

获取原文

摘要

Sina microblog has been a very popular social microblog service in recent years. However it's difficult to analyze the network structure of Sina microblog because of the huge amount of users. The emergence of cloud computing gives us a new approach to analyze large-scale social networks. Hadoop is a widely used cloud computing platform, several clustering algorithms such as K-means and Canopy have already been implemented on it. However, the initial cluster centers of K-means are hard to select. Canopy provides a way to choose initial centers, but it is not suitable for very large data sets, and both traditional K-means and Canopy K-means converge very slowly. This paper proposes an improved method to cluster microblog users based on their relationship. We name our method "Weight Partitioned Canopy K-means" (WPCK), implement it on Hadoop cluster, and test it along with existing methods. Experimental results show that WPCK can reduce the number of iterations by about 1/3 of traditional K-means and Canopy K-means, while their performance are almost the same.
机译:近年来,新浪微博已成为非常流行的社交微博服务。但是由于用户数量巨大,很难分析新浪微博的网络结构。云计算的出现为我们提供了一种分析大型社交网络的新方法。 Hadoop是一个广泛使用的云计算平台,已经在其上实现了多种聚类算法,例如K-means和Canopy。但是,很难选择K均值的初始聚类中心。 Canopy提供了一种选择初始中心的方法,但是它不适用于非常大的数据集,并且传统的K均值和Canopy K均值都非常缓慢地收敛。本文提出了一种基于微博用户关系的聚类方法。我们将方法命名为“ Weight Partitioned Canopy K-means”(WPCK),在Hadoop集群上实现该方法,并与现有方法一起对其进行测试。实验结果表明,WPCK可以将迭代次数减少传统K均值和Canopy K均值的1/3,而它们的性能几乎相同。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号