K-Means Clustering with Bagging and MapReduce

机译：K-mears与袋装和mapreduce聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering is one of the most widely used techniques for exploratory data analysis: Across all disciplines, from social sciences over biology to computer science, people try to get a first intuition about their data by identifying meaningful groups among the data objects. K-means is one of the most famous clustering algorithms. Its simplicity and speed allow it to run on large data sets. However, it also has several drawbacks. First, this algorithm is instable and sensitive to outliers. Second, its performance will be inefficient when dealing with large data sets. In this paper, a method is proposed to solve those problems. which uses an ensemble learning method bagging to overcome the instability and sensitivity to outliers, while using a distributed computing framework MapReduce to solve the inefficiency .problem in clustering on large data sets. Extensive experiments have been performed to show that our approach is efficient.

机译：聚类是探索性数据分析最广泛的技术之一：在所有学科中，从社会科学对计算机科学中的社会科学，人们通过识别数据对象之间的有意义的群体来获得对数据的第一个直觉。 K-means是最着名的聚类算法之一。其简单性和速度允许它在大数据集上运行。但是，它也有几个缺点。首先，该算法对异常值不稳定并敏感。其次，在处理大数据集时，其性能效率低下。在本文中，提出了一种方法来解决这些问题。它使用集合学习方法Bagging来克服对异常值的不稳定性和敏感性，同时使用分布式计算框架MapReduce来解决效率效率。在大数据集上群集中的组分。已经进行了广泛的实验表明我们的方法是有效的。

著录项

来源
《Hawaii International Conference on System Sciences》|2011年||共8页
会议地点
作者
Hai-Guang Li; Gong-Qing Wu; Xue-Gang Hu; Jing Zhang; Lian Li; Xindong Wu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 N94-53;
关键词
Bagging; MapReduce; performance;

机译：袋装;mapreduce;性能;
入库时间 2022-08-20 19:55:14

相似文献

外文文献
中文文献
专利

1. An Analysis of Distributed Document Clustering Using MapReduce Based K-Means Algorithm [J] . Tanvir Habib Sardar, Zahid Ansari Journal of The Institution of Engineers (India): Series B . 2020,第6期

机译：基于Mapreduce的K均值算法分布分布式文档聚类分析
2. Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering [J] . Zahid Ansari, Asif Afzal, Tanvir Habib Sardar Journal of The Institution of Engineers (India): Series B . 2019,第2期

机译：使用基于Hadoop MapReduce的并行K均值聚类的数据分类
3. A hybrid MapReduce-based k-means clustering using genetic algorithm for distributed datasets [J] . Sinha Ankita, Jana Prasanta K. Journal of supercomputing . 2018,第4期

机译：基于遗传算法的基于MapReduce混合k均值聚类的分布式数据集
4. K-Means Clustering with Bagging and MapReduce [C] . Hai-Guang Li, Gong-Qing Wu, Xue-Gang Hu, Proceedings of the 44th Hawaii International Conference on System Sciences . 2011

机译：使用Bagging和MapReduce的K均值聚类
5. Mapreduce and Heterogeneity: Power-Aware Bag-of-Tasks, Framework Parameter Sensitivity, and Dynamic Cluster Aware Framework Configuration [D] . Hartog, Jessica. 2018

机译：Mapreduce和异构性：Power-Aware任务包，Framework参数敏感性和Dynamic Cluster Aware Framework配置
6. Does Determination of Initial Cluster Centroids Improve the Performance of K-Means Clustering Algorithm? Comparison of Three Hybrid Methods by Genetic Algorithm Minimum Spanning Tree and Hierarchical Clustering in an Applied Study [O] . Saeedeh Pourahmad, Atefeh Basirat, Amir Rahimi, 2020

机译：初始簇质心的确定是否提高了K-Means聚类算法的性能？应用研究中遗传算法最小生成树和分层聚类的三种混合方法的比较
7. Implementasi K-Means Clustering pada lingkungan big data menggunakan model pemrograman MapReduce [O] . Vione Engelbertus 2017

机译：大数据环境中K-Means聚类的实现使用MapReduce编程模型
8. Large Scale Hierarchical K-Means Based Image Retrieval With MapReduce [R] . Murphy, W E 2014

机译：基于mapReduce的大规模分层K-means图像检索

K-Means Clustering with Bagging and MapReduce

摘要

著录项

相似文献

相关主题

期刊订阅