首页> 外文期刊>Journal of computer sciences >A Density Based Dynamic Data Clustering Algorithm based on Incremental Dataset
【24h】

A Density Based Dynamic Data Clustering Algorithm based on Incremental Dataset

机译:基于增量数据集的基于密度的动态数据聚类算法

获取原文
获取原文并翻译 | 示例
           

摘要

Problem statement: Clustering and visualizing high-dimensional dynamic data is a challenging problem. Most of the existing clustering algorithms are based on the static statistical relationship among data. Dynamic clustering is a mechanism to adopt and discover clusters in real time environments. There are many applications such as incremental data mining in data warehousing applications, sensor network, which relies on dynamic data clustering algorithms. Approach: In this work, we present a density based dynamic data clustering algorithm for clustering incremental dataset and compare its performance with full run of normal DBSCAN, Chameleon on the dynamic dataset. Most of the clustering algorithms perform well and will give ideal performance with good accuracy measured with clustering accuracy, which is calculated using the original class labels and the calculated class labels. However, if we measure the performance with a cluster validation metric, then it will give another kind of result. Results: This study addresses the problems of clustering a dynamic dataset in which the data set is increasing in size over time by adding more and more data. So to evaluate the performance of the algorithms, we used Generalized Dunn Index (GDI), Davies-Bouldin index (DB) as the cluster validation metric and as well as time taken for clustering. Conclusion: In this study, we have successfully implemented and evaluated the proposed density based dynamic clustering algorithm. The performance of the algorithm was compared with Chameleon and DBSCAN clustering algorithms. The proposed algorithm performed significantly well in terms of clustering accuracy as well as speed.
机译:问题陈述:高维动态数据的聚类和可视化是一个具有挑战性的问题。现有的大多数聚类算法都是基于数据之间的静态统计关系。动态集群是一种在实时环境中采用和发现集群的机制。有许多应用程序,例如数据仓库应用程序中的增量数据挖掘,传感器网络,它们依赖于动态数据聚类算法。方法:在这项工作中,我们提出了一种基于密度的动态数据聚类算法,用于聚类增量数据集,并将其性能与动态数据集上正常DBSCAN Chameleon的全部性能进行比较。大多数聚类算法性能良好,并且将通过使用原始类别标签和计算出的类别标签计算出的聚类精度,以良好的精度提供理想的性能。但是,如果我们使用集群验证指标来衡量性能,那么它将给出另一种结果。结果:本研究解决了对动态数据集进行聚类的问题,在该数据集中,通过添加越来越多的数据,数据集的大小随时间增加。因此,为了评估算法的性能,我们使用广义邓恩指数(GDI),戴维斯-布尔丁指数(DB)作为聚类验证指标以及聚类所花费的时间。结论:在这项研究中,我们已经成功地实施和评估了所提出的基于密度的动态聚类算法。将算法的性能与Chameleon和DBSCAN聚类算法进行了比较。提出的算法在聚类精度和速度方面都表现出色。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号