首页> 外文会议>International Workshop on Big Data and Information Security >Design of intelligent k-means based on spark for big data clustering
【24h】

Design of intelligent k-means based on spark for big data clustering

机译:基于Spark的大数据聚类智能k均值设计

获取原文

摘要

The growth of data has bring us to the big data generation where the amount of data cannot be computed using conventional environment. There are a lot of computational environment that had been developed to compute big data, one of them is Hadoop that has Distributed File System and MapReduce framework. Spark is newly framework that can be combined with Hadoop and run on top of it. In this paper, we design intelligent k-means based on Spark for big data clustering. Our design is using batch of data instead using original Resilient Distributed Dataset (RDD). We compare our design with the implementation that using original RDD of data. Result of experiment shows that implementation using batch of data is faster than the implementation using original RDD.
机译:数据的增长使我们进入了大数据生成时代,在大数据时代无法使用常规环境计算数据量。已经开发了许多计算环境来计算大数据,其中之一就是具有分布式文件系统和MapReduce框架的Hadoop。 Spark是可以与Hadoop结合并在其之上运行的新框架。在本文中,我们设计了基于Spark的智能k均值用于大数据聚类。我们的设计使用的是批量数据,而不是原始的弹性分布式数据集(RDD)。我们将我们的设计与使用原始RDD数据的实施方案进行比较。实验结果表明,使用批量数据的实现比使用原始RDD的实现更快。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号