首页> 外文会议>Hawaii International Conference on System Sciences >K-Means Clustering with Bagging and MapReduce
【24h】

K-Means Clustering with Bagging and MapReduce

机译:K-mears与袋装和mapreduce聚类

获取原文

摘要

Clustering is one of the most widely used techniques for exploratory data analysis: Across all disciplines, from social sciences over biology to computer science, people try to get a first intuition about their data by identifying meaningful groups among the data objects. K-means is one of the most famous clustering algorithms. Its simplicity and speed allow it to run on large data sets. However, it also has several drawbacks. First, this algorithm is instable and sensitive to outliers. Second, its performance will be inefficient when dealing with large data sets. In this paper, a method is proposed to solve those problems. which uses an ensemble learning method bagging to overcome the instability and sensitivity to outliers, while using a distributed computing framework MapReduce to solve the inefficiency .problem in clustering on large data sets. Extensive experiments have been performed to show that our approach is efficient.
机译:聚类是探索性数据分析最广泛的技术之一:在所有学科中,从社会科学对计算机科学中的社会科学,人们通过识别数据对象之间的有意义的群体来获得对数据的第一个直觉。 K-means是最着名的聚类算法之一。其简单性和速度允许它在大数据集上运行。但是,它也有几个缺点。首先,该算法对异常值不稳定并敏感。其次,在处理大数据集时,其性能效率低下。在本文中,提出了一种方法来解决这些问题。它使用集合学习方法Bagging来克服对异常值的不稳定性和敏感性,同时使用分布式计算框架MapReduce来解决效率效率。在大数据集上群集中的组分。已经进行了广泛的实验表明我们的方法是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号