首页> 美国卫生研究院文献>other >An incremental clustering method based on the boundary profile
【2h】

An incremental clustering method based on the boundary profile

机译:基于边界轮廓的增量聚类方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Many important applications continuously generate data, such as financial transaction administration, satellite monitoring, network flow monitoring, and web information processing. The data mining results are always evolving with the newly generated data. Obviously, for the clustering task, it is better to incrementally update the new clustering results based on the old data rather than to recluster all of the data from scratch. The incremental clustering approach is an essential way to solve the problem of clustering with growing Big Data. This paper proposes a boundary-profile-based incremental clustering (BPIC) method to find arbitrarily shaped clusters with dynamically growing datasets. This method represents the existing clustering results with a collection of boundary profiles and discards the inner points of clusters rather than keep all data. It greatly saves both time and space storage costs. To identify the boundary profile, this paper presents a boundary-vector-based boundary point detection (BV-BPD) algorithm that summarizes the structure of the existing clusters. The BPIC method processes each new point in an online fashion and updates the clustering results in a batch mode. When a new point arrives, the BPIC method either immediately labels it or temporarily puts it into a bucket according to the relationship between the new data and the boundary profiles. A bucket is employed to distinguish the noise from the potential seeds of new clusters and alleviate the effects of data order. When the bucket is full, the BPIC method will cluster the data within it and update the clustering results. Thus, the BPIC method is insensitive to noise and the order of new data, which is critical for the robustness of the incremental clustering process. In the experiments, the performance of the boundary point detection algorithm BV-BPD is compared with the state-of-the-art method. The results show that the BV-BPD is better than the state-of-the-art method. Additionally, the performance of BPIC and other two incremental clustering methods are investigated in terms of clustering quality, time and space efficiency. The experimental results indicate that the BPIC method is able to get a qualified clustering result on a large dataset with higher time and space efficiency.
机译:许多重要的应用程序不断生成数据,例如金融交易管理,卫星监视,网络流量监视和Web信息处理。数据挖掘结果始终与新生成的数据一起发展。显然,对于群集任务,最好是基于旧数据增量更新新的群集结果,而不是从头开始重新整理所有数据。增量集群方法是解决随着大数据增长而集群的问题的基本方法。本文提出了一种基于边界轮廓的增量聚类(BPIC)方法,以找到具有动态增长的数据集的任意形状的聚类。该方法用边界轮廓的集合表示现有的聚类结果,并丢弃聚类的内部点,而不是保留所有数据。它大大节省了时间和空间存储成本。为了识别边界轮廓,本文提出了一种基于边界矢量的边界点检测(BV-BPD)算法,该算法总结了现有集群的结构。 BPIC方法以在线方式处理每个新点,并以批处理方式更新聚类结果。当有新点到达时,BPIC方法会根据新数据和边界轮廓之间的关系立即对其进行标记或将其临时放入存储桶中。使用存储桶将噪声与新群集的潜在种子区分开来,并减轻数据顺序的影响。存储桶装满后,BPIC方法将对其中的数据进行聚类并更新聚类结果。因此,BPIC方法对噪声和新数据的顺序不敏感,这对于增量聚类过程的鲁棒性至关重要。在实验中,将边界点检测算法BV-BPD的性能与最新方法进行了比较。结果表明,BV-BPD优于最新方法。此外,从群集质量,时间和空间效率方面研究了BPIC和其他两种增量式群集方法的性能。实验结果表明,BPIC方法能够在较大的数据集上以较高的时间和空间效率获得合格的聚类结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号