STiMR k-Means: An Efficient Clustering Method for Big Data

Ben HajKacem Mohamed Aymen; Ben Ncir Chiheb-Eddine; Essoussi Nadia

首页> 外文期刊>International Journal of Pattern Recognition and Artificial Intelligence >STiMR k-Means: An Efficient Clustering Method for Big Data

【24h】

STiMR k-Means: An Efficient Clustering Method for Big Data

机译：STiMR k-Means：大数据的有效聚类方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Big Data clustering has become an important challenge in data analysis since several applications require scalable clustering methods to organize such data into groups of similar objects. Given the computational cost of most of the existing clustering methods, we propose in this paper a new clustering method, referred to as STiMR k-means, able to provide good tradeoff between scalability and clustering quality. The proposed method is based on the combination of three acceleration techniques: sampling, triangle inequality and MapReduce. Sampling is used to reduce the number of data points when building cluster prototypes, triangle inequality is used to reduce the number of comparisons when looking for nearest clusters and MapReduce is used to configure a parallel framework for running the proposed method. Experiments performed on simulated and real datasets have shown the effectiveness of the proposed method, with the existing ones, in terms of running time, scalability and internal validity measures.

机译：大数据集群已成为数据分析中的一项重要挑战，因为一些应用程序需要可伸缩的集群方法来将此类数据组织为相似对象的组。考虑到大多数现有聚类方法的计算成本，我们在本文中提出了一种新的聚类方法，称为STiMR k-means，它能够在可伸缩性和聚类质量之间提供良好的折衷。所提出的方法基于三种加速技术的组合：采样，三角不等式和MapReduce。在构建集群原型时，采样用于减少数据点的数量，在寻找最近的集群时，三角形不等式用于减少比较的数量，而MapReduce用于配置并行框架以运行所提出的方法。在模拟数据集和真实数据集上进行的实验表明，与现有方法相比，该方法的有效性，运行时间，可伸缩性和内部有效性度量方面。

著录项

来源
《International Journal of Pattern Recognition and Artificial Intelligence》 |2019年第8期|1950013.1-1950013.23|共23页
作者
Ben HajKacem Mohamed Aymen; Ben Ncir Chiheb-Eddine; Essoussi Nadia;
展开▼
作者单位

Univ Tunis, Inst Super Gest Tunis, LARODEC, 41 Rue Liberth, Le Bardo 2000, Tunisia;

Univ Tunis, Inst Super Gest Tunis, LARODEC, 41 Rue Liberth, Le Bardo 2000, Tunisia|Univ Manouba, ESEN, La Manouba 2010, Tunisia;

Univ Tunis, Inst Super Gest Tunis, LARODEC, 41 Rue Liberth, Le Bardo 2000, Tunisia;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Partitional clustering; sampling; triangle inequality; MapReduce; Big Data clustering;

机译：分区聚类;采样;三角不等式;mapReduce;大数据聚类;

相似文献

外文文献
中文文献
专利

1. STiMR k-Means: An Efficient Clustering Method for Big Data [J] . Ben HajKacem Mohamed Aymen, Ben Ncir Chiheb-Eddine, Essoussi Nadia International Journal of Pattern Recognition and Artificial Intelligence . 2019,第8期

机译：StimR K-Means：大数据的有效聚类方法
2. IP2P K-means: an efficient method for data clustering on sensor networks [J] . Mirhadi P., Zandinia S., Goodarzipour A., Management Science Letters . 2013,第3期

机译：IP2P K-means：在传感器网络上进行数据聚类的有效方法
3. An efficient method to improve the clustering performance for high dimensional data by Principal Component Analysis and modified K-means [J] . Tajunisha, Saravanan International Journal of Database Management Systems . 2011,第1期

机译：通过主成分分析和改进的K均值改进高维数据聚类性能的有效方法
4. AN EFFICIENT INITIALIZATION METHOD FOR K-MEANS CLUSTERING OF HYPERSPECTRAL DATA [C] . A. Alizade Naeini, A. Jamshidzadeh, M. Saadatseresht, ISPRS International Conference on Geospatial Information Research . 2014

机译：高光谱数据k-meast群集的高效初始化方法
5. Efficient genetic k-means clustering algorithm and its application to data mining on different domains. [D] . Alsayat, Ahmed Mosa. 2016

机译：高效的遗传k均值聚类算法及其在不同领域数据挖掘中的应用。
6. A comparison of latent class K-means and K-median methods for clustering dichotomous data [O] . Michael J. Brusco, Emilie Shireman, Douglas Steinley -1

机译：潜在类K均值和K中值方法对二分数据进行聚类的比较
7. AN EFFICIENT INITIALIZATION METHOD FOR K-MEANS CLUSTERING OF HYPERSPECTRAL DATA [O] . A. Alizade Naeini, A. Jamshidzadeh, M. Saadatseresht, 2014

机译：一种有效的初始化方法用于高频光谱数据的K均值聚类

STiMR k-Means: An Efficient Clustering Method for Big Data

摘要

著录项

相似文献

相关主题

期刊订阅