首页> 外文会议>International Conference on Intelligent Networking and Collaborative Systems >Apache Mahout's k-Means vs Fuzzy k-Means Performance Evaluation
【24h】

Apache Mahout's k-Means vs Fuzzy k-Means Performance Evaluation

机译:Apache Mahout的k均值与模糊k均值性能评估

获取原文

摘要

The emergence of the Big Data as a disruptive technology for next generation of intelligent systems, has brought many issues of how to extract and make use of the knowledge obtained from the data within short times, limited budget and under high rates of data generation. The foremost challenge identified here is the data processing, and especially, mining and analysis for knowledge extraction. As the 'old' data mining frameworks were designed without Big Data requirements, a new generation of such frameworks is being developed fully implemented in Cloud platforms. One such frameworks is Apache Mahout aimed to leverage fast processing and analysis of Big Data. The performance of such new data mining frameworks is yet to be evaluated and potential limitations are to be revealed. In this paper we analyse the performance of Apache Mahout using large real data sets from the Twitter stream. We exemplify the analysis for the case of two clustering algorithms, namely, k-Means and Fuzzy k-Means, using a Hadoop cluster infrastructure for the experimental study.
机译:大数据作为下一代智能系统的破坏性技术的出现,带来了许多问题,即如何在短时间内,有限的预算和高数据生成率下提取和利用从数据中获得的知识。这里确定的首要挑战是数据处理,特别是知识提取的挖掘和分析。由于设计的“旧”数据挖掘框架没有大数据需求,因此正在开发在云平台中完全实施的新一代此类框架。一种这样的框架是Apache Mahout,旨在利用大数据的快速处理和分析。这种新的数据挖掘框架的性能尚待评估,潜在的局限性将被揭示。在本文中,我们使用Twitter流中的大量实际数据集来分析Apache Mahout的性能。我们使用Hadoop群集基础设施进行实验研究,以例证两种聚类算法(即k-Means和Fuzzy k-Means)的情况为例进行分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号