Fast K-Means Clustering for Very Large Datasets Based onMapReduce Combined with a New Cutting Method

机译：基于新切割方法的基于amapreduce的非常大的数据集的快速k-means聚类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering very large datasets is a challenging problem for data mining and processing. MapReduce is considered as a powerful programming framework which significantly reduces executing time by dividing a job into several tasks and executes them in a distributed environment. K-Means which is one of the most used clustering methods and K-Means based on MapReduce is considered as an advanced solution for very large dataset clustering. However, the executing time is still an obstacle due to the increasing number of iterations when there is an increase of dataset size and number of clusters. This paper presents a new approach for reducing the number of iterations of K-Means algorithm which can be applied to very large dataset clustering. This new method can reduce up to 30 percent of iterations while maintaining up to 98 percent accuracy when tested with several very large datasets with real data type attributes. Based on the significant results from the experiments, this paper proposes a new fast K-Means clustering method for very large datasets based on MapReduce combined with a new cutting method (abbreviated to FMR.K-Means).

机译：聚类非常大的数据集是数据挖掘和处理的具有挑战性问题。 MapReduce被视为一个强大的编程框架，通过将作业划分为多个任务并在分布式环境中执行它们来显着降低执行时间。 K-means是基于MapReduce的最常用的聚类方法和k均值之一被认为是非常大的数据集群集的高级解决方案。然而，由于数据集大小和群集数量增加，执行时间仍然是由于越来越多的迭代次数越来越多的障碍。本文提出了一种新方法，用于减少k均值算法的迭代次数，该算法可以应用于非常大的数据集聚类。这种新方法可以减少高达30％的迭代，同时使用具有实际数据类型属性的几个非常大的数据集进行测试时保持高达98％的精度。基于实验的显着结果，本文提出了一种基于MapReduce的非常大型数据集的新的快速K-Means聚类方法，与新的切割方法相结合（缩写为FMR.K-inse）。

著录项

来源
《Conference on Knowledge and Systems Engineering》|2015年||共12页
会议地点
作者
Duong Van Hieu; Phayung Meesad;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-532;
关键词
MapReduce; K-Means; Fast K-Means for very large datasets;

机译：mapreduce;k-means;非常大的数据集的快速k均值;

相似文献

外文文献
中文文献
专利

1. A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach [J] . Ben Salem Semeh, Naouali Sami, Chtourou Zied Computers and Electrical Engineering . 2018,第期

机译：基于K-Meancy的方法的大型分类数据集的快速有效的分区聚类算法
2. Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal optimization clusters on a large dataset [J] . G. Surya Narayana, Kamakshaiah Kolli Multimedia Tools and Applications . 2021,第3期

机译：具有大型数据集的传播多式化优化集群的多变核估计快速密度峰值聚类的模糊k均值聚类
3. Fast and de-noise support vector machine training method based on fuzzy clustering method for large real world datasets [J] . OMID NAGHASH ALMASI, MODJTABA ROUHANI Turkish Journal of Electrical Engineering and Computer Sciences . 2016,第1期

机译：基于模糊聚类的大型现实世界数据集快速降噪支持向量机训练方法
4. Fast K-Means Clustering for Very Large Datasets Based onMapReduce Combined with a New Cutting Method [C] . Duong Van Hieu, Phayung Meesad Conference on Knowledge and Systems Engineering . 2015

机译：基于新切割方法的基于amapreduce的非常大的数据集的快速k-means聚类
5. Visual data mining: Using parallel coordinate plots with K-means clustering and color to find correlations in a multidimensional dataset. [D] . Peterson, Angela R. 2009

机译：可视数据挖掘：使用具有K均值聚类和颜色的平行坐标图来查找多维数据集中的相关性。
6. Canonical PSO Based K-Means Clustering Approach for Real Datasets [O] . Lopamudra Dey, Sanjay Chakraborty 2014

机译：基于规范PSO的真实数据集K-Means聚类方法
7. Table 4: The k-means clustering method and short-term periods for different datasets. [O] . -1

机译：表4：k-means聚类方法和不同数据集的短期时段。
8. Sampling Within k-Means Algorithm to Cluster Large Datasets [R] . Bejarano, J., Bose, K., Brannan, T., 2011

机译：在k-means算法中采样以聚类大数据集

Fast K-Means Clustering for Very Large Datasets Based onMapReduce Combined with a New Cutting Method

摘要

著录项

相似文献

相关主题

期刊订阅