...
首页> 外文期刊>SIGMOD record >Statistical Grid-based Clustering over Data Streams
【24h】

Statistical Grid-based Clustering over Data Streams

机译:数据流上基于统计网格的聚类

获取原文
获取原文并翻译 | 示例
           

摘要

A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, most algorithms for data streams sacrifice the correctness of their results for fast processing time. The processing time is greatly influenced by the amount of information that should be maintained. This paper proposes a statistical grid-based approach to clustering data elements of a data stream. Initially, the multidimensional data space of a data stream is partitioned into a set of mutually exclusive equal-size initial cells. When the support of a cell becomes high enough, the cell is dynamically divided into two mutually exclusive intermediate cells based on its distribution statistics. Three different ways of partitioning a dense cell are introduced. Eventually, a dense region of each initial cell is recursively partitioned until it becomes the smallest cell called a unit cell. A cluster of a data stream is a group of adjacent dense unit cells. In order to minimize the number of cells, a sparse intermediate or unit cell is pruned if its support becomes much less than a minimum support. Furthermore, in order to confine the usage of memory space, the size of a unit cell is dynamically minimized such that the result of clustering becomes as accurate as possible. The proposed algorithm is analyzed by a series of experiments to identify its various characteristics.
机译:数据流是连续快速生成的大量无界数据元素序列。由于这个原因,大多数用于数据流的算法为了快速处理时间而牺牲了其结果的正确性。处理时间受应维护的信息量的很大影响。本文提出了一种基于统计网格的方法来对数据流的数据元素进行聚类。最初,数据流的多维数据空间被划分为一组互斥的相等大小的初始单元。当一个单元的支持变得足够高时,该单元将根据其分布统计信息动态地分为两个互斥的中间单元。介绍了划分密集单元的三种不同方式。最终,递归地划分每个初始单元的密集区域,直到它变为称为单位单元的最小单元。数据流的群集是一组相邻的密集单位单元。为了最小化单元的数量,如果稀疏的中间单元或单元格的支持变得远小于最小支持,则将其修剪。此外,为了限制存储空间的使用,动态地使单位单元的大小最小化,使得聚类的结果变得尽可能准确。通过一系列实验对提出的算法进行分析,以确定其各种特性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号