首页> 外文期刊>International Journal of Artificial Intelligence Tools: Architectures, Languages, Algorithms >Big Data Clustering with Kernel k-Means: Resources, Time and Performance
【24h】

Big Data Clustering with Kernel k-Means: Resources, Time and Performance

机译:与内核K-means的大数据聚类:资源,时间和性能

获取原文
获取原文并翻译 | 示例
           

摘要

Data clustering is an unsupervised learning task that has found many applications in various scientific fields. The goal is to find subgroups of closely related data samples (clusters) in a set of unlabeled data. A classic clustering algorithm is the so-called k-Means. It is very popular, however, it is also unable to handle cases in which the clusters are not linearly separable. Kernel k-Means is a state of the art clustering algorithm, which employs the kernel trick, in order to perform clustering on a higher dimensionality space, thus overcoming the limitations of classic k-Means regarding the non-linear separability of the input data. With respect to the challenges of Big Data research, a field that has established itself in the last few years and involves performing tasks on extremely large amounts of data, several adaptations of the Kernel k-Means have been proposed, each of which has different requirements in processing power and running time, while also incurring different trade-offs in performance. In this paper, we present several issues and techniques involving the usage of Kernel k-Means for Big Data clustering and how the combination of each component in a clustering framework fares in terms of resources, time and performance. We use experimental results, in order to evaluate several combinations and provide a recommendation on how to approach a Big Data clustering problem.
机译:数据聚类是一个无监督的学习任务,在各种科学领域找到了许多应用程序。目标是在一组未标记的数据中找到密切相关的数据样本(集群)的子组。经典聚类算法是所谓的k均值。然而,它非常受欢迎,但是,它也无法处理群集不是线性可分离的案例。内核K-means是采用内核特征的艺术聚类算法的状态,以便在更高的维度空间上执行聚类,从而克服了关于输入数据的非线性可分性的经典k算法的限制。关于大数据研究的挑战,在过去几年中建立了自己的领域并涉及在极大的数据上执行任务,提出了几个内核K-Means的适应,其中每个要求都有不同的要求在处理电源和运行时间,同时也会在性能下产生不同的权衡。在本文中,我们提出了几个问题和技术,涉及使用内核K-means进行大数据群集的使用以及如何在群集框架中的每个组件在资源,时间和性能方面的组合。我们使用实验结果,以评估多种组合,并提供有关如何接近大数据聚类问题的建议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号