首页> 外文会议>ACM SIGMOD international conference on Management of data >Incremental and effective data summarization for dynamic hierarchical clustering
【24h】

Incremental and effective data summarization for dynamic hierarchical clustering

机译:动态分层聚类的增量有效数据汇总

获取原文

摘要

Mining informative patterns from very large, dynamically changing databases poses numerous interesting challenges. Data summarizations (e.g., data bubbles) have been proposed to compress very large static databases into representative points suitable for subsequent effective hierarchical cluster analysis. In many real world applications, however, the databases dynamically change due to frequent insertions and deletions, possibly changing the data distribution and clustering structure over time. Completely reapplying both the data summarization and the clustering algorithm to detect the changes in the clustering structure and update the uncovered data patterns following such deletions and insertions is prohibitively expensive for large fast changing databases. In this paper, we propose a new scheme to maintain data bubbles incrementally. By using incremental data bubbles, a high-quality hierarchical clustering is quickly available at any point in time. In our scheme, a quality measure for incremental data bubbles is used to identify data bubbles that do not compress well their underlying data points after certain insertions and deletions. Only these data bubbles are re-built using efficient split and merge operations. An extensive experimental evaluation shows that the incremental data bubbles provide significantly faster data summarization than completely re-building the data bubbles after a certain number of insertions and deletions, and are effective in preserving (and in some cases even improving) the quality of the data summarization.
机译:从大型的,动态变化的数据库中获取信息模式带来了许多有趣的挑战。已经提出了数据汇总(例如,数据气泡)以将非常大的静态数据库压缩成适合于随后的有效分层聚类分析的代表点。但是,在许多实际应用中,数据库由于频繁的插入和删除而动态变化,从而可能随时间改变数据分布和集群结构。完全重新应用数据汇总和聚类算法以检测聚类结构中的更改并在此类删除和插入之后更新未发现的数据模式对于大型快速更改的数据库来说是非常昂贵的。在本文中,我们提出了一种新的方案来逐步维护数据气泡。通过使用增量数据气泡,可以在任何时间点快速获得高质量的层次集群。在我们的方案中,用于增量数据气泡的质量度量用于识别在某些插入和删除之后不能很好地压缩其基础数据点的数据气泡。使用有效的拆分和合并操作仅可以重建这些数据气泡。广泛的实验评估表明,与经过一定数量的插入和删除后完全重新构建数据气泡相比,增量数据气泡提供了更快的数据汇总,并且可以有效地保存(在某些情况下甚至可以提高)数据质量。总结。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号