首页> 外文会议>Chinese Control and Decision Conference >A DISTRIBUTED CLUSTERING METHOD TO SEGMENT MICRO-BLOG USERS ON CLOUD ENVIRONMENTS
【24h】

A DISTRIBUTED CLUSTERING METHOD TO SEGMENT MICRO-BLOG USERS ON CLOUD ENVIRONMENTS

机译:分布式聚类方法在云环境上分段微博用户

获取原文

摘要

With the rapid development of social network analysis (SNA for short), people increasingly pay attention to segment micro-blog users in the SNAs. It's a new trend on classic marketing technique segmentation. In the case of micro-blog, it's useful to get a group of users with a common set of characters and learn what's on their mind. As is usually the case,the standard for measuring the category of the micro-blog users is multi-objective, i.e., the data is high dimensional. If you have a personal micro-blog account, it's easy enough to create the lists that might be most meaningful to you by using generic clustering algorithms. And if your business has Tens of millions of users, the near real-time requirement and the lack of efficient clustering algorithms to identify and distinguish them limits the power and scalability of this approach. To overcome these limitations, in this paper we introduce a novel distributed high dimensional data clustering algorithm based on Map-Reduce framework to distinguish the different communities from the entire social network, called CDGM-Clu. Extensive experiments on real and synthetic datasets show that the CDGM-Clu algorithm is significantly efficient and scalable, and useful for analyzing a large social network data.
机译:随着社会网络分析的快速发展(短暂的SNA),人们越来越关注SENMES MICRE-BLOG用户在SNAS中。这是经典营销技术细分的新趋势。在微博的情况下,让一组具有常见字符集的用户是有用的,并学会主意。通常情况下,测量微博用户类别的标准是多目标,即数据是高维的。如果您有一个个人微博客帐户,则可以通过使用通用聚类算法创建可能对您最有意义的列表。如果您的业务有数百万用户,则近实时要求和缺乏有效的聚类算法来识别和区分它们的功率和可扩展性。为了克服这些限制,本文介绍了一种基于地图减少框架的新型分布式高维数据聚类算法,以区分来自整个社交网络的不同社区,称为CDGM-CLU。关于实际和合成数据集的广泛实验表明CDGM-CLU算法显着高效和可扩展,可用于分析大型社交数据数据。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号