Discovering Communities with Self-Adaptive k Clustering in Microblog Data

机译：在MicroBlog数据中发现具有自适应k聚类的社区

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nowadays, microblogging has been a popular social network service whose population has incredibly increased in past few years. Many business companies regard microblogging service as an indispensable medium to directly obtain timely opinions from customers and potential customers. A community in social network refers to a crowd of people having similar interests or paying their attention on same things. User community recognition in microblogging social network service is very important for identifying hot topics or users' interests which are very helpful for companies to improve their marketing strategies. However, the massive non-structural tweet data brings tremendous challenge for efficiently mining the valuable communities hidden in it. Tweet data is characterized as containing massive information, being involved in large fields, short-length and non-structure. This makes tweets quite different from the conventional text documents. In order to analyze the data more effectively, in this paper, we propose a set of techniques to preprocess tweets, such as word identification, categories matching and data standardization. An unsupervised learning method has been presented to automatically cluster microblog users into different communities. In the method, an optimized CLARANS algorithm has been developed according to the characteristics of microblog data. During the process of clustering, the interactive relationship between tweets is also exploited to improve the clustering quality. In addition, a self-adaptive k strategy is employed to make the proposed approach more applicable. In order to investigate the performance of our approach from different aspects, we conducted a series of experiments with the microblog data collected from SINA Weibo.

机译：如今，微博一直是一个受欢迎的社交网络服务，过去几年人口令人难以置信的增加。许多商业公司将微博服务视为不可或缺的媒介，以直接从客户和潜在客户提供及时意见。社会网络中的一个社区是指具有类似兴趣或将注意力的人群在同样的事情上。微博社交网络服务中的用户社区认可对于识别热门话题或用户的兴趣非常重要，这对公司来提高其营销策略非常有用。然而，大规模的非结构推文数据带来了巨大的挑战，以便有效地挖掘隐藏在其中的有价值的社区。推文数据的特征在于包含大量信息，涉及大字段，短长度和非结构。这使得Tweets与传统文本文件完全不同。为了更有效地分析数据，在本文中，我们向预处理推文提出了一系列技术，例如单词识别，类别匹配和数据标准化。已经提出了无监督的学习方法，以自动将微博用户自动进入不同的社区。在该方法中，根据微博数据的特性开发了优化的Clarans算法。在聚类过程中，推文之间的交互式关系也被利用以提高聚类质量。此外，采用自适应k策略使提出的方法更适用。为了调查我们从不同方面的方法的性能，我们通过从新浪微博收集的微博数据进行了一系列实验。

著录项

来源
《International Conference on Social Computing and Its Applications;International Symposium on Big Data and MapReduce;International Symposium on Privacy and Security in Cloud and Social;International Workshop on Web Wisdom;International Workshop on Society Network Analysis and Information Diffusion Modeling;International Workshop on Social Network Service on Databases》|2012年||共8页
会议地点
作者
Ting Huang; Peng Dunlu; Cao Lidong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301-53;
关键词
adaptive k; clustering; community recognition; microblogging; social network;

机译：自适应k;聚类;社区识别;微博;社交网络;

相似文献

外文文献
中文文献
专利

1. High-performance social networking: microblog community detection based on efficient interactive characteristic clustering [J] . Wang Ru, Rho Seungmin, Cai Wandong Cluster computing . 2017,第2期

机译：高性能社交网络：基于高效交互特征聚类的微博社区检测
2. Diversity based self-adaptive clusters using PSO clustering for crime data [J] . Seema Patil, R. J. Anandhi International Journal of Information Technology . 2020,第2期

机译：使用PSO集群进行犯罪数据的多样性基于自适应群集
3. Study on microblog public opinion data mining algorithm based on multi-visual clustering model [J] . Lin-lin Li, Wei-zhen Hou, Jing Liu International journal of autonomous and adaptive communications systems . 2020,第2期

机译：基于多视觉聚类模型的微博舆论数据挖掘算法研究
4. Discovering Communities with Self-Adaptive k Clustering in Microblog Data [C] . Ting Huang, Peng Dunlu, Cao Lidong The Second International Conference on Cloud and Green Computing. . 2012

机译：通过微博数据中的自适应k聚类发现社区
5. Discovering spatial co-clustering patterns in collision data. [D] . Li, Dapeng. 2013

机译：在碰撞数据中发现空间共聚模式。
6. Blind method for discovering number of clusters in multidimensional datasets by regression on linkage hierarchies generated from random data [O] . Osbert C. Zalay 2020

机译：通过从随机数据生成的链接层次结构上的回归在多维数据集中发现多维数据集数量的盲方法
7. Discovering Research Communities by Clustering Bibliographical Data [O] . Muhlenbach, Fabrice, Lallich, Stéphane 2010

机译：通过书目数据聚类发现研究社区

Discovering Communities with Self-Adaptive k Clustering in Microblog Data

摘要

著录项

相似文献

相关主题

期刊订阅