【24h】

Discovering Communities with Self-Adaptive k Clustering in Microblog Data

机译:在MicroBlog数据中发现具有自适应k聚类的社区

获取原文

摘要

Nowadays, microblogging has been a popular social network service whose population has incredibly increased in past few years. Many business companies regard microblogging service as an indispensable medium to directly obtain timely opinions from customers and potential customers. A community in social network refers to a crowd of people having similar interests or paying their attention on same things. User community recognition in microblogging social network service is very important for identifying hot topics or users' interests which are very helpful for companies to improve their marketing strategies. However, the massive non-structural tweet data brings tremendous challenge for efficiently mining the valuable communities hidden in it. Tweet data is characterized as containing massive information, being involved in large fields, short-length and non-structure. This makes tweets quite different from the conventional text documents. In order to analyze the data more effectively, in this paper, we propose a set of techniques to preprocess tweets, such as word identification, categories matching and data standardization. An unsupervised learning method has been presented to automatically cluster microblog users into different communities. In the method, an optimized CLARANS algorithm has been developed according to the characteristics of microblog data. During the process of clustering, the interactive relationship between tweets is also exploited to improve the clustering quality. In addition, a self-adaptive k strategy is employed to make the proposed approach more applicable. In order to investigate the performance of our approach from different aspects, we conducted a series of experiments with the microblog data collected from SINA Weibo.
机译:如今,微博一直是一个受欢迎的社交网络服务,过去几年人口令人难以置信的增加。许多商业公司将微博服务视为不可或缺的媒介,以直接从客户和潜在客户提供及时意见。社会网络中的一个社区是指具有类似兴趣或将注意力的人群在同样的事情上。微博社交网络服务中的用户社区认可对于识别热门话题或用户的兴趣非常重要,这对公司来提高其营销策略非常有用。然而,大规模的非结构推文数据带来了巨大的挑战,以便有效地挖掘隐藏在其中的有价值的社区。推文数据的特征在于包含大量信息,涉及大字段,短长度和非结构。这使得Tweets与传统文本文件完全不同。为了更有效地分析数据,在本文中,我们向预处理推文提出了一系列技术,例如单词识别,类别匹配和数据标准化。已经提出了无监督的学习方法,以自动将微博用户自动进入不同的社区。在该方法中,根据微博数据的特性开发了优化的Clarans算法。在聚类过程中,推文之间的交互式关系也被利用以提高聚类质量。此外,采用自适应k策略使提出的方法更适用。为了调查我们从不同方面的方法的性能,我们通过从新浪微博收集的微博数据进行了一系列实验。

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号