Data stream clustering by divide and conquer approach based on vector model

Madjid Khalilian; Norwati Mustapha; Nasir Sulaiman

首页> 外文期刊>Journal of Big Data >Data stream clustering by divide and conquer approach based on vector model

【24h】

Data stream clustering by divide and conquer approach based on vector model

机译：基于矢量模型的分治法数据流聚类

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, many researchers have focused on data stream processing as an efficient method for extracting knowledge from big data. Data stream clustering is an unsupervised approach that is employed for huge data. The continuous effort on data stream clustering method has one common goal which is to achieve an accurate clustering algorithm. However, there are some issues that are overlooked by the previous works in proposing data stream clustering solutions; (1) clustering dataset including big segments of repetitive data, (2) monitoring clustering structure for ordinal data streams and (3) determining important parameters such as k number of exact clusters in stream of data. In this paper, DCSTREAM method is proposed with regard to the mentioned issues to cluster big datasets using the vector model and k-Means divide and conquer approach. Experimental results show that DCSTREAM can achieve superior quality and performance as compare to STREAM and ConStream methods for abrupt and gradual real world datasets. Results show that the usage of batch processing in DCSTREAM and ConStream is time consuming compared to STREAM but it avoids further analysis for detecting outliers and novel micro-clusters.

机译：近来，许多研究人员已将数据流处理作为从大数据中提取知识的有效方法进行了研究。数据流群集是一种用于大数据的无监督方法。持续努力的数据流聚类方法具有一个共同的目标，即实现一种精确的聚类算法。但是，在提出数据流群集解决方案时，有些工作被以前的工作所忽略; （1）聚类数据集，其中包含大量重复数据;（2）监视有序数据流的聚类结构;（3）确定重要参数，例如数据流中的k个精确聚类。针对上述问题，本文提出了DCSTREAM方法，以向量模型和k-Means分治法对大型数据集进行聚类。实验结果表明，与STREAM和ConStream方法相比，DCSTREAM可以实现突变和渐进的真实世界数据集，并具有更高的质量和性能。结果表明，与STREAM相比，在DCSTREAM和ConStream中使用批处理比较耗时，但是避免了进一步分析来检测异常值和新型微簇的情况。

著录项

来源
《Journal of Big Data 》 |2016年第1期| 共页
作者
Madjid Khalilian; Norwati Mustapha; Nasir Sulaiman;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术 ;
关键词

相似文献

外文文献
中文文献
专利

1. Clustering right-skewed data stream via Birnbaum-Saunders mixture models: A flexible approach based on fuzzy clustering algorithm [J] . Hashemi Farzane, Naderi Mehrdad, Mashinchi Mashallah Applied Soft Computing . 2019 ,第期

机译：通过Birnbaum-Saunders混合模型聚类右偏斜数据流：一种基于模糊聚类算法的灵活方法
2. Graph-Based Divide and Conquer Method for Parallelizing Spatial Operations on Vector Data [J] . Xiangguo Lin, Xiaochen Kang Remote Sensing . 2014 ,第10期

机译：向量数据空间操作并行化的基于图的分而治之方法
3. DC-NMF: nonnegative matrix factorization based on divide-and-conquer for fast clustering and topic modeling [J] . Du Rundong, Kuang Da, Drake Barry, Journal of Global Optimization . 2017 ,第4期

机译：DC-NMF：基于分治法的非负矩阵分解，用于快速聚类和主题建模
4. Conquering the Divide: Continuous Clustering of Distributed Data Streams [C] . Cormode, G., Muthukrishnan, . 2007

机译：克服鸿沟：分布式数据流的连续聚类
5. The Collection and Storage Function Transition Point from Cluster-Based to Big Data Streaming Data [D] . Rubey, Sidney I. 2018

机译：从基于集群的数据到大数据流数据的收集和存储功能转换点
6. A divide-and-conquer strategy in tumor sampling enhances detection of intratumor heterogeneity in routine pathology: A modeling approach in clear cell renal cell carcinoma [O] . José I. Lopez, Jesús M. Cortes, Mattia Barbareschi, -1

机译：肿瘤采样中的分治策略增强了常规病理中肿瘤内异质性的检测：透明细胞肾细胞癌的一种建模方法
7. Data stream clustering by divide and conquer approach based on vector model [O] . 2016

机译：基于矢量模型的分治法数据流聚类

Data stream clustering by divide and conquer approach based on vector model

摘要

著录项

相似文献

相关主题

期刊订阅