首页> 外文会议>IEEE International Conference on Big Data and Smart Computing >Accuracy Evaluation of Overlapping and Multi-Resolution Clustering Algorithms on Large Datasets
【24h】

Accuracy Evaluation of Overlapping and Multi-Resolution Clustering Algorithms on Large Datasets

机译:大数据集上重叠和多分辨率聚类算法的准确性评估

获取原文

摘要

Performance of clustering algorithms is evaluated with the help of accuracy metrics. There is a great diversity of clustering algorithms, which are key components of many data analysis and exploration systems. However, there exist only few metrics for the accuracy measurement of overlapping and multi-resolution clustering algorithms on large datasets. In this paper, we first discuss existing metrics, how they satisfy a set of formal constraints, and how they can be applied to specific cases. Then, we propose several optimizations and extensions of these metrics. More specifically, we introduce a new indexing technique to reduce both the runtime and the memory complexity of the Mean F1 score evaluation. Our technique can be applied on large datasets and it is faster on a single CPU than state-of-the-art implementations running on high-performance servers. In addition, we propose several extensions of the discussed metrics to improve their effectiveness and satisfaction to formal constraints without affecting their efficiency. All the metrics discussed in this paper are implemented in C++ and are available for free as open-source packages that can be used either as stand-alone tools or as part of a benchmarking system to compare various clustering algorithms.
机译:聚类算法的性能在准确性指标的帮助下进行评估。聚类算法种类繁多,它们是许多数据分析和探索系统的关键组成部分。但是,对于大型数据集上重叠和多分辨率聚类算法的准确性测量,仅有很少的度量标准。在本文中,我们首先讨论现有指标,它们如何满足一系列形式上的约束,以及如何将它们应用于特定案例。然后,我们提出这些指标的几种优化和扩展。更具体地说,我们引入了一种新的索引技术,以减少均值F1分数评估的运行时间和内存复杂性。我们的技术可以应用于大型数据集,并且在单个CPU上比在高性能服务器上运行的最新实现要快。此外,我们提出了对所讨论指标的一些扩展,以提高其对形式约束的有效性和满意度,而又不影响其效率。本文中讨论的所有度量标准都是用C ++实现的,可以作为开放源代码包免费获得,它们既可以用作独立工具,也可以用作基准测试系统的一部分,以比较各种聚类算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号