首页> 外文期刊>ACM Computing Surveys >Density-based Algorithms for Big Data Clustering Using MapReduce Framework: A Comprehensive Study
【24h】

Density-based Algorithms for Big Data Clustering Using MapReduce Framework: A Comprehensive Study

机译:基于密度的大数据聚类算法使用MapReduce框架:综合研究

获取原文
获取原文并翻译 | 示例

摘要

Clustering is used to extract hidden patterns and similar groups from data. Therefore, clustering as a method of unsupervised learning is a crucial technique for big data analysis owing to the massive number of unlabeled objects involved. Density-based algorithms have attracted research interest, because they help to better understand complex patterns in spatial datasets that contain information about data related to co-located objects. Big data clustering is a challenging task, because the volume of data increases exponentially. However, clustering using MapReduce can help answer this challenge. In this context, density-based algorithms in MapReduce have been largely investigated in the past decade to eliminate the problem of big data clustering. Despite the diversity of the algorithms proposed, the field lacks a structured review of the available algorithms and techniques for desirable partitioning, local clustering, and merging. This study formalies the problem of density-based clustering using MapReduce, proposes a taxonomy to categorize the proposed algorithms, and provides a systematic and comprehensive comparison of these algorithms according to the partitioning technique, type of local clustering, merging technique, and exactness of their implementations. Finally, the study highlights outstanding challenges and opportunities to contribute to the field of density-based clustering using MapReduce.
机译:群集用于从数据中提取隐藏的模式和类似组。因此,作为无监督学习方法的聚类是由于涉及的大量未标记物体的大数据分析的重要技术。基于密度的算法吸引了研究兴趣,因为它们有助于更好地了解包含有关与共同定位对象相关的数据信息的空间数据集中的复杂模式。大数据聚类是一个具有挑战性的任务,因为数据量呈指数增长。但是,使用MapReduce的群集可以帮助回答此挑战。在这种情况下,MapReduce中基于密度的算法在过去十年中已经在很大程度上调查,以消除大数据集群的问题。尽管提出了算法的多样性,但该领域缺乏对可用算法的结构化审查,以及用于理想的分区,局部聚类和合并的技术。本研究正规正式使用MapReduce的基于密度的聚类问题,提出了一种分类,以根据分区技术,局部聚类,合并技术和其精确性提供对这些算法的系统和全面比较。实施。最后,研究突出了使用MapReduce贡献基于密度的聚类领域的突出挑战和机会。

著录项

  • 来源
    《ACM Computing Surveys》 |2021年第5期|93.1-93.38|共38页
  • 作者

    Khader Mariam; Al-Naymat Ghazi;

  • 作者单位

    Princess Sumaya Univ Technol Khalil Saket St POB 1438 Amman 11941 Jordan;

    Princess Sumaya Univ Technol Khalil Saket St POB 1438 Amman 11941 Jordan|Ajman Univ POB 346 Ajman U Arab Emirates;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Big data; clustering; density clustering; mapreduce framework;

    机译:大数据;聚类;密度聚类;MapReduce框架;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号