【24h】

Clustering algorithm based on optimal intervals division for high-dimension data streams

机译:基于最优区间划分的高维数据流聚类算法

获取原文
获取原文并翻译 | 示例

摘要

Clustering for high-dimension data streams is a main focus in the field of clustering research. In order to optimize the clustering process, especially for the large number of candidate subspaces generated in it, optimal segmentation section technology and FP-tree structure are introduced, based on which, DOIC (Dynamic optimal intervals-based cluster) algorithm is proposed. In this paper, the memory-based data partition and optimal intervals division are defined to generate high-density grids for each dimension, which are stored in a High-Density Unit tree (HDU). The HDU-tree is built according to the principle that high-density grids for the same interval in every dimension are stored in the same branch. Thus the process of clustering highdimension data streams is transformed into that of searching for dense grids in the HDU-tree. By merging HDU-trees, new data streams is inserted and historical data streams is decayed, then the updating of data streams is achieved. The clustering result is returned in the form of DNF expressions timely as requests. The experimental results demonstrate that DOIC has better space scalability and higher clustering quality compared with traditional clustering algorithms.
机译:高维数据流的聚类是聚类研究领域的主要重点。为了优化聚类过程,特别是针对其中生成的大量候选子空间,引入了最优分割部分技术和FP-tree结构,在此基础上提出了DOIC(动态最优区间聚类)算法。在本文中,定义了基于内存的数据分区和最佳间隔划分以针对每个维度生成高密度网格,这些网格存储在高密度单位树(HDU)中。 HDU树是根据每个维度上相同间隔的高密度网格存储在同一分支中的原理构建的。因此,对高维数据流进行聚类的过程将转换为在HDU树中搜索密集网格的过程。通过合并HDU树,插入新的数据流,并衰减历史数据流,然后实现数据流的更新。聚类结果以DNF表达式的形式按要求及时返回。实验结果表明,与传统聚类算法相比,DOIC具有更好的空间可扩展性和更高的聚类质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号