首页> 外文会议>Conference on Knowledge and Systems Engineering >Fast K-Means Clustering for Very Large Datasets Based onMapReduce Combined with a New Cutting Method
【24h】

Fast K-Means Clustering for Very Large Datasets Based onMapReduce Combined with a New Cutting Method

机译:基于新切割方法的基于amapreduce的非常大的数据集的快速k-means聚类

获取原文

摘要

Clustering very large datasets is a challenging problem for data mining and processing. MapReduce is considered as a powerful programming framework which significantly reduces executing time by dividing a job into several tasks and executes them in a distributed environment. K-Means which is one of the most used clustering methods and K-Means based on MapReduce is considered as an advanced solution for very large dataset clustering. However, the executing time is still an obstacle due to the increasing number of iterations when there is an increase of dataset size and number of clusters. This paper presents a new approach for reducing the number of iterations of K-Means algorithm which can be applied to very large dataset clustering. This new method can reduce up to 30 percent of iterations while maintaining up to 98 percent accuracy when tested with several very large datasets with real data type attributes. Based on the significant results from the experiments, this paper proposes a new fast K-Means clustering method for very large datasets based on MapReduce combined with a new cutting method (abbreviated to FMR.K-Means).
机译:聚类非常大的数据集是数据挖掘和处理的具有挑战性问题。 MapReduce被视为一个强大的编程框架,通过将作业划分为多个任务并在分布式环境中执行它们来显着降低执行时间。 K-means是基于MapReduce的最常用的聚类方法和k均值之一被认为是非常大的数据集群集的高级解决方案。然而,由于数据集大小和群集数量增加,执行时间仍然是由于越来越多的迭代次数越来越多的障碍。本文提出了一种新方法,用于减少k均值算法的迭代次数,该算法可以应用于非常大的数据集聚类。这种新方法可以减少高达30%的迭代,同时使用具有实际数据类型属性的几个非常大的数据集进行测试时保持高达98%的精度。基于实验的显着结果,本文提出了一种基于MapReduce的非常大型数据集的新的快速K-Means聚类方法,与新的切割方法相结合(缩写为FMR.K-inse)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号