首页> 外文会议>International Conference on Image and Signal Processing >Considerably Improving Clustering Algorithms Using UMAP Dimensionality Reduction Technique: A Comparative Study

【24h】

Considerably Improving Clustering Algorithms Using UMAP Dimensionality Reduction Technique: A Comparative Study

机译：使用UMAP降维技术显着改善聚类算法的比较研究

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Dimensionality reduction is widely used in machine learning and big data analytics since it helps to analyze and to visualize large, high-dimensional datasets. In particular, it can considerably help to perform tasks like data clustering and classification. Recently, embedding methods have emerged as a promising direction for improving clustering accuracy. They can preserve the local structure and simultaneously reveal the global structure of data, thereby reasonably improving clustering performance. In this paper, we investigate how to improve the performance of several clustering algorithms using one of the most successful embedding techniques: Uniform Manifold Approximation and Projection or UMAP. This technique has recently been proposed as a manifold learning technique for dimensionality reduction. It is based on Riemannian geometry and algebraic topology. Our main hypothesis is that UMAP would permit to find the best clusterable embedding manifold, and therefore, we applied it as a preprocessing step before performing clustering. We compare the results of many well-known clustering algorithms such ask-means, HDBSCAN, GMM and Agglomerative Hierarchical Clustering when they operate on the low-dimension feature space yielded by UMAP. A series of experiments on several image datasets demonstrate that the proposed method allows each of the clustering algorithms studied to improve its performance on each dataset considered. Based on Accuracy measure, the improvement can reach a remarkable rate of 60%.

机译：降维在机器学习和大数据分析中被广泛使用，因为它有助于分析和可视化大型高维数据集。特别是，它可以极大地帮助执行数据聚类和分类之类的任务。最近，嵌入方法已经成为提高聚类精度的有希望的方向。它们可以保留本地结构并同时显示数据的全局结构，从而合理地提高群集性能。在本文中，我们研究如何使用最成功的嵌入技术之一（统一流形近似和投影或UMAP）来提高几种聚类算法的性能。最近，已经提出了该技术作为用于降维的多种学习技术。它基于黎曼几何和代数拓扑。我们的主要假设是，UMAP可以找到最佳的可聚类嵌入流形，因此，在执行聚类之前，我们将其用作预处理步骤。当它们在UMAP产生的低维特征空间上运行时，我们比较了许多众所周知的聚类算法（例如，问均值，HDBSCAN，GMM和聚集层次聚类）的结果。在几个图像数据集上进行的一系列实验表明，该方法允许所研究的每种聚类算法提高其在所考虑的每个数据集上的性能。根据“精度”度量，改进率可以达到60％的显着水平。

著录项

来源
《International Conference on Image and Signal Processing 》|2020年|317-325|共9页
会议地点
作者
Mebarka Allaoui; Mohammed Lamine Kherfi; Abdelhakim Cheriet;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Dimensionality reduction; UMAP; Clustering; Embedding manifold; Big data analytics; Machine learning; Comparative study;

机译：降维; UMAP;集群;嵌入歧管;大数据分析;机器学习;比较研究;

相似文献

外文文献
中文文献
专利

1. COMPARATIVE STUDY OF DIMENSIONALITY REDUCTION TECHNIQUES IN THE CONTEXT OF CLUSTERING [J] . T. SUDHA, P. NAGENDRAKUMAR International Journal of Computer Science Engineering and Information Technology Research . 2016 ,第1期

机译：聚类情况下降维技术的比较研究
2. A comparative study of dimensionality reduction techniques to enhance trace clustering performances [J] . M. Song, H. Yang, S.H. Siadat, Expert Systems with Application . 2013 ,第9期

机译：降维技术增强迹线聚类性能的比较研究
3. Genetic Algorithm Based Dimensionality Reduction for Improving Performance of K-Means Clustering: A Case Study for Categorization of Medical Dataset [J] . Asha Gowda Karegowda, Vidya T. Shama, M.A. Jayaram, International journal of soft computing . 2012 ,第5期

机译：基于遗传算法的降维方法提高K-Means聚类性能：以医学数据集分类为例
4. Improved Time-Series Clustering with UMAP dimension reduction method [C] . Clément Pealat, Guillaume Bouleux, Vincent Cheutet International Conference on Pattern Recognition . 2021

机译：改进了UMAP尺寸减少方法的时间序列聚类
5. Development and Identification of Metrics to Predict the Impact of Dimension Reduction Techniques on Classical Machine Learning Algorithms for Still Highway Images [D] . Khan, Wasim Akram. 2020

机译：指标的发展与识别预测尺寸减少技术对静态机器学习算法的影响
6. Comparative study of unsupervised dimension reduction techniques for the visualization of microarray gene expression data [O] . Christoph Bartenhagen, Hans-Ulrich Klein, Christian Ruckert, 2010

机译：用于微阵列基因表达数据可视化的无监督降维技术的比较研究
7. A Comparative Study of Dimensionality Reduction Techniques to Enhance Trace Clustering Performances [O] . Yang Hanna 2012

机译：降维技术增强迹线聚类性能的比较研究
8. A Study of Techniques for Real-Time,On-Line Optimum Flight Path Control. Algorithms for Three-Dimensional,Minimum-Time Flight Paths with Two State Variables. [R] . bryson,arthur e. jr hedrick, j. karl 1975

机译：实时在线最优飞行路径控制技术研究。具有两个状态变量的三维最小时间飞行路径算法。

Considerably Improving Clustering Algorithms Using UMAP Dimensionality Reduction Technique: A Comparative Study

摘要

著录项

相似文献

相关主题

期刊订阅