首页> 外文会议>International Conference on Intelligent Networking and Collaborative Systems >Apache Mahout's k-Means vs Fuzzy k-Means Performance Evaluation
【24h】

Apache Mahout's k-Means vs Fuzzy k-Means Performance Evaluation

机译:Apache Mahout的K-Means VS模糊K-Meancy性能评估

获取原文

摘要

The emergence of the Big Data as a disruptive technology for next generation of intelligent systems, has brought many issues of how to extract and make use of the knowledge obtained from the data within short times, limited budget and under high rates of data generation. The foremost challenge identified here is the data processing, and especially, mining and analysis for knowledge extraction. As the 'old' data mining frameworks were designed without Big Data requirements, a new generation of such frameworks is being developed fully implemented in Cloud platforms. One such frameworks is Apache Mahout aimed to leverage fast processing and analysis of Big Data. The performance of such new data mining frameworks is yet to be evaluated and potential limitations are to be revealed. In this paper we analyse the performance of Apache Mahout using large real data sets from the Twitter stream. We exemplify the analysis for the case of two clustering algorithms, namely, k-Means and Fuzzy k-Means, using a Hadoop cluster infrastructure for the experimental study.
机译:大数据的出现作为下一代智能系统的中断技术,带来了如何在短时间内提取和利用从数据中获得的知识,有限预算和数据生成的高速度。这里识别的最重要的挑战是数据处理,特别是挖掘和分析知识提取。由于“旧”数据挖掘框架在没有大数据要求的情况下设计,因此在云平台中,正在开发新一代此类框架。一个这样的框架是Apache Mahout,旨在利用对大数据的快速处理和分析。尚不评估这种新数据挖掘框架的性能,并揭示潜在的限制。在本文中,我们使用来自Twitter流的大型实际数据集分析Apache Mahout的性能。我们用针对实验研究的Hadoop集群基础设施来举例说明了两个聚类算法的分析,即K-Meant算法,即K-Meant和模糊K-Means。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号