首页> 外文期刊>Journal of digital information management >A Clustering with Slope Algorithm based on MapReduce
【24h】

A Clustering with Slope Algorithm based on MapReduce

机译:基于MapReduce的带坡度聚类算法

获取原文
获取原文并翻译 | 示例
       

摘要

The clustering with slope (CLOPE) algorithm is widely used to analyze transactional data because of its excellent performance, lower memory cost, and better quality of results compared with other clustering algorithms. However, the running time of the CLOPE algorithm in large datasets may take more than several days, which is unacceptable. To solve the time issue caused by the algorithm's serial running mode, a new parallel running mode needs to be introduced to the CLOPE algorithm to improve its efficiency. A CLOPE algorithm based on MapReduce is presented in this paper. The new algorithm was run in parallel on a Hadoop cluster with multiple nodes. The Hadoop platform split the large dataset into multiple small data blocks, and the CLOPE algorithm was run on each block to obtain small clusters. The modified cluster-oriented CLOPE algorithm then merged these small clusters to the expected number of clusters. Experiments show that CLOPE based on the MapReduce algorithm runs faster and more efficiently than the CLOPE algorithm and demonstrates the same quality of clustering. Time remained constant against data volume, and time complexity was only affected by the size of the Hadoop cluster. Thus, the proposed algorithm solves the time issue in clustering large datasets and can be utilized to cluster transactional trade data, website logs, DNS query logs in limited time, and even transactional data with high dimension.
机译:与其他聚类算法相比,斜率聚类(CLOPE)算法具有出色的性能,较低的存储成本和更好的结果质量,因此被广泛用于分析事务数据。但是,CLOPE算法在大型数据集中的运行时间可能会超过几天,这是不可接受的。为了解决算法的串行运行模式所引起的时间问题,CLOPE算法需要引入一种新的并行运行模式以提高其效率。提出了一种基于MapReduce的CLOPE算法。新算法在具有多个节点的Hadoop集群上并行运行。 Hadoop平台将大型数据集分为多个小型数据块,并在每个数据块上运行CLOPE算法以获得小型集群。修改后的面向集群的CLOPE算法然后将这些小集群合并为预期的集群数量。实验表明,基于MapReduce算法的CLOPE比CLOPE算法运行更快,效率更高,并且证明了相同的聚类质量。时间与数据量保持不变,时间复杂度仅受Hadoop集群的大小影响。因此,该算法解决了大型数据集聚类中的时间问题,可用于在有限的时间内对交易交易数据,网站日志,DNS查询日志甚至高维交易数据进行聚类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号