A Parallel Clustering Method Study Based on MapReduce

机译：基于MapReduce的并行聚类方法研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Clustering is considered as one of the most important tasks in data mining. The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data. It has been widely applied to many kinds of areas. Many clustering methods have been studied, such as k-means, Fisher clustering method, Kohonen neural network and so on. In many kinds of areas, the scale of data set becomes larger and larger. Classical clustering methods are out of reach in practice in face of big data. The study of clustering methods based on large scale data is considered as an important task. MapReduce is taken as the most efficient model to deal with data intensive problems. In this paper, parallel clustering method based on MapReduce is studied. The research mainly contributes the following aspects. Firstly, it determines the initial center objectively. Secondly, information loss is taken as the distance metric between two samples. The efficiency of the method is illustrated with a practical DNA clustering problem.

机译：群集被视为数据挖掘中最重要的任务之一。聚类目标是确定一组未标记数据中的内在分组。它已被广泛应用于多种地区。已经研究了许多聚类方法，例如K-Means，Fisher聚类方法，Kohonen神经网络等。在多种区域中，数据集的规模变大并且更大。在大数据方面，古典聚类方法脱离了实践。基于大规模数据的聚类方法研究被认为是一个重要任务。 MapReduce被视为最有效的模型来处理数据密集型问题。本文研究了基于MapReduce的并行聚类方法。该研究主要有助于以下几个方面。首先，它客观地确定了初始中心。其次，信息丢失被视为两个样本之间的距离度量。用实际的DNA聚类问题说明该方法的效率。

著录项

来源
《International Workshop on Cloud Computing and Information Security》|2013年||共4页
会议地点
作者
Sun Zhanquan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP393-53;
关键词
Clustering; Information bottleneck theory; MapReduce; Multidimensional Scaling; Twister;

机译：聚类;信息瓶颈理论;Mapreduce;多维缩放;捻线;

相似文献

外文文献
中文文献
专利

1. A technique for parallel query optimization using MapReduce framework and a semantic-based clustering method [J] . Elham Azhir, Nima Jafari Navimipour, Mehdi Hosseinzadeh, PeerJ Computer Science . 2021,第a期

机译：MapReduce框架并行查询优化技术和基于语义的聚类方法
2. Parallel Semi-Supervised Multi-Ant Colonies Clustering Ensemble Based on MapReduce Methodology [J] . Yan Yang, Fei Teng, Tianrui Li, Cloud Computing, IEEE Transactions on . 2018,第3期

机译：基于MapReduce方法的并行半监督多蚁群聚类集成
3. Research on K-medoids clustering algorithm based on data density and its parallel processing based on MapReduce [J] . Aiguo Liu, Shuli Zou, Taorong Qiu, Journal of Residuals Science & Technology . 2016,第7期

机译：基于数据密度的K-medoids聚类算法及其基于MapReduce的并行处理研究
4. A Parallel Clustering Method Study Based on MapReduce [C] . Sun Zhanquan International Workshop on Cloud Computing and Information Security . 2013

机译：基于MapReduce的并行聚类方法研究
5. APOP: An automatic pattern- and object-based code parallelization framework for clusters. [D] . Liu, Xuli. 2007

机译：APOP：用于集群的基于模式和对象的自动代码并行化框架。
6. K-mer clustering algorithm using a MapReduce framework: application to the parallelization of the Inchworm module of Trinity [O] . Chang Sik Kim, Martyn D. Winn, Vipin Sachdeva, 2017

机译：使用MapReduce框架的K-mer聚类算法：在Trinity的Inchworm模块并行化中的应用
7. A Parallel Clustering Method Study Based on MapReduce [O] . Zhanquan Sun 2013

机译：基于MapReduce的并行聚类方法研究

A Parallel Clustering Method Study Based on MapReduce

摘要

著录项

相似文献

相关主题

期刊订阅