Big Data Clustering with Kernel k-Means: Resources, Time and Performance

Tsapanos Nikolaos; Tefas Anastasios; Nikolaidis Nikolaos; Pitas Ioannis

首页> 外文期刊>International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms >Big Data Clustering with Kernel k-Means: Resources, Time and Performance

【24h】

Big Data Clustering with Kernel k-Means: Resources, Time and Performance

机译：与内核K-means的大数据聚类：资源，时间和性能

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of unlabeled data. A classic clustering algorithm is the so-called k-Means. It is very popular, however, it is also unable to handle cases in which the clusters are not linearly separable. Kernel k-Means is a state of the art clustering algorithm, which employs the kernel trick, in order to perform clustering on a higher dimensionality space, thus overcoming the limitations of classic k-Means regarding the non-linear separability of the input data. With respect to the challenges of Big Data research, a field that has established itself in the last few years and involves performing tasks on extremely large amounts of data, several adaptations of the Kernel k-Means have been proposed, each of which has different requirements in processing power and running time, while also incurring different trade-offs in performance. In this paper, we present several issues and techniques involving the usage of Kernel k-Means for Big Data clustering and how the combination of each component in a clustering framework fares in terms of resources, time and performance. We use experimental results, in order to evaluate several combinations and provide a recommendation on how to approach a Big Data clustering problem.

机译：数据聚类是一个无监督的学习任务，在各种科学领域找到了许多应用程序。目标是在一组未标记的数据中找到密切相关的数据样本（集群）的子组。经典聚类算法是所谓的k均值。然而，它非常受欢迎，但是，它也无法处理群集不是线性可分离的案例。内核K-means是采用内核特征的艺术聚类算法的状态，以便在更高的维度空间上执行聚类，从而克服了关于输入数据的非线性可分性的经典k算法的限制。关于大数据研究的挑战，在过去几年中建立了自己的领域并涉及在极大的数据上执行任务，提出了几个内核K-Means的适应，其中每个要求都有不同的要求在处理电源和运行时间，同时也会在性能下产生不同的权衡。在本文中，我们提出了几个问题和技术，涉及使用内核K-means进行大数据群集的使用以及如何在群集框架中的每个组件在资源，时间和性能方面的组合。我们使用实验结果，以评估多种组合，并提供有关如何接近大数据聚类问题的建议。

著录项

来源
《International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms》 |2018年第4期|共18页
作者
Tsapanos Nikolaos; Tefas Anastasios; Nikolaidis Nikolaos; Pitas Ioannis;
展开▼
作者单位

Aristotle Univ Thessaloniki Dept Informat Univ Campus Box 54 124 Thessaloniki Greece;

Aristotle Univ Thessaloniki Dept Informat Univ Campus Box 54 124 Thessaloniki Greece;

Aristotle Univ Thessaloniki Dept Informat Univ Campus Box 54 124 Thessaloniki Greece;

Aristotle Univ Thessaloniki Dept Informat Univ Campus Box 54 124 Thessaloniki Greece;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类人工智能理论;
关键词
Big data; kernel k-means; data clustering; approximate kernel k-means; Apache Spark; distributed computation;

机译：大数据;内核K-means;数据聚类;近似内核K-means;Apache Spark;分布式计算;

相似文献

外文文献
中文文献
专利

1. Big Data Clustering with Kernel k-Means: Resources, Time and Performance [J] . Tsapanos Nikolaos, Tefas Anastasios, Nikolaidis Nikolaos, International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms . 2018,第4期

机译：与内核K-means的大数据聚类：资源，时间和性能
2. Time series k-means: A new k-means type smooth subspace clustering for time series data [J] . Huang Xiaohui, Ye Yunming, Xiong Liyan, Information Sciences: An International Journal . 2016,第Null期

机译：时间序列k均值：用于时间序列数据的新k均值类型平滑子空间聚类
3. Very large-scale data classification based on K-means clustering and multi-kernel SVM [J] . Tang Tinglong, Chen Shengyong, Zhao Meng, Soft computing: A fusion of foundations, methodologies and applications . 2019,第11期

机译：基于K-means群集和多核SVM的非常大规模的数据分类
4. Dynamic time-alignment k-means kernel clustering for time sequence clustering [C] . Santarcangelo Joseph, Zhang Xiao-Ping IEEE International Conference on Image Processing . 2015

机译：动态时间对齐k均值内核聚类，用于时间序列聚类
5. Visual data mining: Using parallel coordinate plots with K-means clustering and color to find correlations in a multidimensional dataset. [D] . Peterson, Angela R. 2009

机译：可视数据挖掘：使用具有K均值聚类和颜色的平行坐标图来查找多维数据集中的相关性。
6. Consensus Kernel K-Means Clustering for Incomplete Multiview Data [O] . Yongkai Ye, Xinwang Liu, Qiang Liu, 2017

机译：不完整的多视图数据的共识内核K均值聚类
7. Analysis of Simple K-Mean and Parallel K-Mean Clustering for Software Products and Organizational Performance Using Education Sector Dataset [O] . Rui Shang, Balqees Ara, Islam Zada, 2021

机译：使用教育部门数据集分析软件产品和组织绩效的简单K均值和平行k平均聚类

Big Data Clustering with Kernel k-Means: Resources, Time and Performance

摘要

著录项

相似文献

相关主题

期刊订阅