...
首页> 外文期刊>Journal of Big Data >Data stream clustering by divide and conquer approach based on vector model
【24h】

Data stream clustering by divide and conquer approach based on vector model

机译:基于矢量模型的分治法数据流聚类

获取原文

摘要

Recently, many researchers have focused on data stream processing as an efficient method for extracting knowledge from big data. Data stream clustering is an unsupervised approach that is employed for huge data. The continuous effort on data stream clustering method has one common goal which is to achieve an accurate clustering algorithm. However, there are some issues that are overlooked by the previous works in proposing data stream clustering solutions; (1) clustering dataset including big segments of repetitive data, (2) monitoring clustering structure for ordinal data streams and (3) determining important parameters such as k number of exact clusters in stream of data. In this paper, DCSTREAM method is proposed with regard to the mentioned issues to cluster big datasets using the vector model and k-Means divide and conquer approach. Experimental results show that DCSTREAM can achieve superior quality and performance as compare to STREAM and ConStream methods for abrupt and gradual real world datasets. Results show that the usage of batch processing in DCSTREAM and ConStream is time consuming compared to STREAM but it avoids further analysis for detecting outliers and novel micro-clusters.
机译:近来,许多研究人员已将数据流处理作为从大数据中提取知识的有效方法进行了研究。数据流群集是一种用于大数据的无监督方法。持续努力的数据流聚类方法具有一个共同的目标,即实现一种精确的聚类算法。但是,在提出数据流群集解决方案时,有些工作被以前的工作所忽略; (1)聚类数据集,其中包含大量重复数据;(2)监视有序数据流的聚类结构;(3)确定重要参数,例如数据流中的k个精确聚类。针对上述问题,本文提出了DCSTREAM方法,以向量模型和k-Means分治法对大型数据集进行聚类。实验结果表明,与STREAM和ConStream方法相比,DCSTREAM可以实现突变和渐进的真实世界数据集,并具有更高的质量和性能。结果表明,与STREAM相比,在DCSTREAM和ConStream中使用批处理比较耗时,但是避免了进一步分析来检测异常值和新型微簇的情况。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号