A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis

Fahad A.; Alshatri N.; Tari Z.; Alamri A.; Khalil I.; Zomaya A.Y.; Foufou S.; Bouras A.

首页> 外文期刊>Emerging Topics in Computing, IEEE Transactions on >A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis

【24h】

A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis

机译：大数据聚类算法研究：分类法和实证分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering algorithms have emerged as an alternative powerful meta-learning tool to accurately analyze the massive volume of data generated by modern applications. In particular, their main goal is to categorize data into clusters such that objects are grouped in the same cluster when they are similar according to specific metrics. There is a vast body of knowledge in the area of clustering and there has been attempts to analyze and categorize them for a larger number of applications. However, one of the major issues in using clustering algorithms for big data that causes confusion amongst practitioners is the lack of consensus in the definition of their properties as well as a lack of formal categorization. With the intention of alleviating these problems, this paper introduces concepts and algorithms related to clustering, a concise survey of existing (clustering) algorithms as well as providing a comparison, both from a theoretical and an empirical perspective. From a theoretical perspective, we developed a categorizing framework based on the main properties pointed out in previous studies. Empirically, we conducted extensive experiments where we compared the most representative algorithm from each of the categories using a large number of real (big) data sets. The effectiveness of the candidate clustering algorithms is measured through a number of internal and external validity metrics, stability, runtime, and scalability tests. In addition, we highlighted the set of clustering algorithms that are the best performing for big data.

机译：聚类算法已经成为一种可替代的功能强大的元学习工具，可以准确地分析现代应用程序生成的大量数据。特别是，它们的主要目标是将数据分类到群集中，以便根据特定指标将对象相似时将它们分组到同一群集中。在集群领域有大量的知识，并且已经尝试对它们进行分析和分类以用于更多的应用程序。但是，将聚类算法用于大数据的主要问题之一是引起从业者之间的困惑，这是在其属性的定义上缺乏共识以及缺乏正式的分类。为了缓解这些问题，本文从理论和经验的角度介绍了与聚类相关的概念和算法，对现有（聚类）算法进行了简要概述，并提供了比较。从理论上讲，我们根据先前研究中指出的主要属性开发了一个分类框架。根据经验，我们进行了广泛的实验，其中我们使用了大量的真实（大）数据集比较了每个类别中最具代表性的算法。候选聚类算法的有效性通过许多内部和外部有效性指标，稳定性，运行时和可伸缩性测试进行衡量。此外，我们重点介绍了最适用于大数据的一组聚类算法。

著录项

来源
《Emerging Topics in Computing, IEEE Transactions on》 |2014年第3期|267-279|共13页
作者
Fahad A.; Alshatri N.; Tari Z.; Alamri A.; Khalil I.; Zomaya A.Y.; Foufou S.; Bouras A.;
展开▼
作者单位

School of Computer Science and Information Technology, Royal Melbourne Institute of Technology, Melbourne, VIC, Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Algorithm design and analysis; Big data; Clustering algorithms; Clustering methods; Neural networks; Partitioning algorithms; Taxonomies; Clustering algorithms; big data; unsupervised learning;

机译：算法设计与分析;大数据;聚类算法;聚类方法;神经网络;分区算法;分类法;聚类算法;大数据;无监督学习;

相似文献

外文文献
中文文献
专利

1. An empirical taxonomy of youths' fears: Cluster analysis of the American Fear Survey Schedule [J] . Burnham JJ, Schaefer BA, Giesen J Psychology in the schools . 2006,第6期

机译：青年恐惧的经验分类法：《美国恐惧调查表》的聚类分析
2. RE-IDENTIFYING REGISTER DATA BY SURVEY DATA USING CLUSTER ANALYSIS: AN EMPIRICAL STUDY [J] . JOHANN BACHER, RUTH BRAND, STEFAN BENDER International Journal of Uncertainty, Fuzziness, and Knowledge-based Systems . 2002,第5期

机译：使用聚类分析通过调查数据重新识别注册数据：实证研究
3. Empirical Analysis of Data Clustering Algorithms [J] . Pranav Nerurkar, Archana Shirke, Madhav Chandane, Procedia Computer Science . 2018,第1期

机译：数据聚类算法的实证分析
4. Survey on Machine Learning-Based Clustering Algorithms for IoT Data Cluster Analysis [C] . Sivadi Balakrishna, Vijender Kumar Solanki, Raghvendra Kumar, International Conference on Research in Intelligent and Computing in Engineering . 2020

机译：基于机器学习群集算法的IOT数据集群分析调查
5. Clustering educational digital library usage data: Comparisons of latent class analysis and K-means algorithms [D] . Xu, Beijie 2011

机译：聚集教育数字图书馆使用数据：潜在类别分析和K-means算法的比较
6. Clustering analysis of microRNA and mRNA expression data from TCGA using maximum edge-weighted matching algorithms [O] . Lizhong Ding, Zheyun Feng, Yongsheng Bai 2019

机译：使用最大边缘加权匹配算法对TCGA中microRNA和mRNA表达数据进行聚类分析
7. A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis [O] . Adil Fahad, Najlaa Alshatri, Zahir Tari, 2014

机译：大数据集群算法调查：分类与实证分析
8. Multispectral image classification of MRI data using an empirically- derived clustering algorithm [R] . Horn, K. M. , Osbourn, G. C. , Bouchard, A. M. , 1998

机译：使用经验导出的聚类算法对mRI数据进行多光谱图像分类

A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis

摘要

著录项

相似文献

相关主题

期刊订阅