Fast and effective Big Data exploration by clustering

Ianni Michele; Masciari Elio; Mazzeo Giuseppe M.; Mezzanzanica Mario; Zaniolo Carlo

首页> 外文期刊>Future generation computer systems >Fast and effective Big Data exploration by clustering

【24h】

Fast and effective Big Data exploration by clustering

机译：通过集群快速有效地探索大数据

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The rise of Big Data era calls for more efficient and effective Data Exploration and analysis tools. In this respect, the need to support advanced analytics on Big Data is driving data scientist' interest toward massively parallel distributed systems and software platforms, such as Map-Reduce and Spark, that make possible their scalable utilization. However, when complex data mining algorithms are required, their fully scalable deployment on such platforms faces a number of technical challenges that grow with the complexity of the algorithms involved. Thus algorithms, that were originally designed for a sequential nature, must often be redesigned in order to effectively use the distributed computational resources. In this paper, we explore these problems, and then propose a solution which has proven to be very effective on the complex hierarchical clustering algorithm CLUBS+. By using four stages of successive refinements, CLUBS+ delivers high-quality clusters of data grouped around their centroids, working in a totally unsupervised fashion. Experimental results confirm the accuracy and scalability of CLUBS+ on platforms tailored for Big Data management. (C) 2019 Elsevier B.V. All rights reserved.

机译：大数据时代的兴起要求使用更有效的数据探索和分析工具。在这方面，支持对大数据进行高级分析的需求正在推动数据科学家对大规模并行的分布式系统和软件平台（例如Map-Reduce和Spark）的兴趣，从而使其可扩展利用成为可能。但是，当需要复杂的数据挖掘算法时，它们在此类平台上的完全可扩展部署面临着许多技术挑战，这些挑战随着所涉及算法的复杂性而增长。因此，最初为顺序性质设计的算法必须经常进行重新设计，以有效地使用分布式计算资源。在本文中，我们探讨了这些问题，然后提出了一种已证明对复杂的层次聚类算法CLUBS +非常有效的解决方案。通过使用四个阶段的连续细化，CLUBS +可以提供高质量的数据簇，这些数据簇围绕其质心进行分组，并且完全不受监督。实验结果证实了在适合大数据管理的平台上CLUBS +的准确性和可扩展性。（C）2019 Elsevier B.V.保留所有权利。

著录项

来源
《Future generation computer systems》 |2020年第1期|84-94|共11页
作者
Ianni Michele; Masciari Elio; Mazzeo Giuseppe M.; Mezzanzanica Mario; Zaniolo Carlo;
展开▼
作者单位

Univ Calabria DIMES Arcavacata Di Rende Italy;

Univ Naples Federico II DIETI Naples Italy;

Facebook Menlo Pk CA USA;

Milano Bicocca Univ DISMEQ Milan Italy;

Univ Calif Los Angeles Comp Sci Los Angeles CA USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Big Data; Clustering; Data exploration;

机译：大数据;集群;数据探索;

相似文献

外文文献
中文文献
专利

1. Diluvian Clustering: A Fast, Effective Algorithm for Clustering Compositional and Other Data [J] . Ritchie Nicholas W. M. Microscopy and microanalysis: The official journal of Microscopy Society of America, Microbeam Analysis Society, Microscopical Society of Canada . 2015,第5期

机译：Diluvian聚类：一种快速有效的聚类成分和其他数据的算法
2. Optimizing star-coordinate visualization models for effective interactive cluster exploration on big data [J] . Keke Chen Intelligent data analysis . 2014,第2期

机译：优化星坐标可视化模型以有效地对大数据进行交互式集群探索
3. A fast and effective partitional clustering algorithm for large categorical datasets using a k-means based approach [J] . Ben Salem Semeh, Naouali Sami, Chtourou Zied Computers and Electrical Engineering . 2018,第期

机译：基于K-Meancy的方法的大型分类数据集的快速有效的分区聚类算法
4. Cluster-Based Exploration for Effective Keyword Search over Semantic Datasets [C] . Roberto De Virgilio, Paolo Cappellari, Michele Miscione Conceptual modeling - ER 2009 . 2009

机译：基于聚类的语义数据集有效搜索的探索
5. A fast and scalable hardware architecture for K-means clustering for big data analysis. [D] . Raghavan, Ramprasad. 2016

机译：用于K均值群集的快速且可扩展的硬件体系结构，用于大数据分析。
6. fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data [O] . Ling-Hong Hung, Ram Samudrala -1

机译：fast_protein_cluster：大规模蛋白质建模数据的并行和优化聚类
7. ClusMAM: fast and effective unsupervised clustering of large complex datasets using metric access methods [O] . Souza Jessica Andressa de, Cazzolato Mirela Teixeira, Traina Agma Juci Machado 2016

机译：ClusMAM：使用度量访问方法对大型复杂数据集进行快速有效的无监督聚类

Fast and effective Big Data exploration by clustering

摘要

著录项

相似文献

相关主题

期刊订阅