首页> 外文会议>International Workshop on Cloud Computing and Information Security >A Parallel Clustering Method Study Based on MapReduce
【24h】

A Parallel Clustering Method Study Based on MapReduce

机译:基于MapReduce的并行聚类方法研究

获取原文

摘要

Clustering is considered as one of the most important tasks in data mining. The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. It has been widely applied to many kinds of areas. Many clustering methods have been studied, such as k-means, Fisher clustering method, Kohonen neural network and so on. In many kinds of areas, the scale of data set becomes larger and larger. Classical clustering methods are out of reach in practice in face of big data. The study of clustering methods based on large scale data is considered as an important task. MapReduce is taken as the most efficient model to deal with data intensive problems. In this paper, parallel clustering method based on MapReduce is studied. The research mainly contributes the following aspects. Firstly, it determines the initial center objectively. Secondly, information loss is taken as the distance metric between two samples. The efficiency of the method is illustrated with a practical DNA clustering problem.
机译:群集被视为数据挖掘中最重要的任务之一。聚类目标是确定一组未标记数据中的内在分组。它已被广泛应用于多种地区。已经研究了许多聚类方法,例如K-Means,Fisher聚类方法,Kohonen神经网络等。在多种区域中,数据集的规模变大并且更大。在大数据方面,古典聚类方法脱离了实践。基于大规模数据的聚类方法研究被认为是一个重要任务。 MapReduce被视为最有效的模型来处理数据密集型问题。本文研究了基于MapReduce的并行聚类方法。该研究主要有助于以下几个方面。首先,它客观地确定了初始中心。其次,信息丢失被视为两个样本之间的距离度量。用实际的DNA聚类问题说明该方法的效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号