基于快速地标采样的大规模谱聚类算法

叶茂; 刘文芬

首页> 中文期刊> 《电子与信息学报》 >基于快速地标采样的大规模谱聚类算法

基于快速地标采样的大规模谱聚类算法

开具论文收录证明 >>

期刊封面封底目录下载 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

为避免传统谱聚类算法高复杂度的应用局限，基于地标表示的谱聚类算法利用地标点与数据集各点间的相似度矩阵，有效降低了谱嵌入的计算复杂度。在大数据集情况下，现有的随机抽取地标点的方法会影响聚类结果的稳定性，k均值中心点方法面临收敛时间未知、反复读取数据的问题。该文将近似奇异值分解应用于基于地标点的谱聚类，设计了一种快速地标点采样算法。该算法利用由近似奇异向量矩阵行向量的长度计算的抽样概率来进行抽样，同随机抽样策略相比，保证了聚类结果的稳定性和精度，同k均值中心点策略相比降低了算法复杂度。同时从理论上分析了抽样结果对原始数据的信息保持性，并对算法的性能进行了实验验证。%The applicability of traditional spectral clustering is limited by its high complexity in large-scale data sets. Through construction of affinity matrix between landmark points and data points, the Landmark-based Spectral Clustering (LSC) algorithm can significantly reduce the computational complexity of spectral embedding. It is vital for clustering results to apply the suitable strategies of the generation of landmark points. While considering big data problems, the existing generation strategies of landmark points face some deficiencies: the unstable results of random sampling, along with the unknown convergence time and the repeatability of data reading in k-means centers method. In this paper, a rapid landmark-sampling spectral clustering algorithm based on the approximate singular value decomposition is designed, which makes the sampling probability of each landmark point decided by the row norm of the approximate singular vector matrix. Compared with LSC algorithm based on random sampling, the clustering result of new algorithm is more stable and accurate;compared with LSC algorithm based on k-means centers, the new algorithm reduces the computational complexity. Moreover, the preservation of information in original data is analyzed for the landmark-sampling results theoretically. At the same time, the performance of new approach is verified by the experiments in some public data sets.

著录项

来源
《电子与信息学报》 |2017年第2期|278-284|共7页
作者
叶茂; 刘文芬;
展开▼
作者单位

解放军信息工程大学郑州 450002 数学工程与先进计算国家重点实验室郑州 450002;

解放军信息工程大学郑州 450002 数学工程与先进计算国家重点实验室郑州 450002;

展开▼
原文格式 PDF
正文语种 chi
中图分类自动推理、机器学习;
关键词
地标点采样; 大数据; 谱聚类; 近似奇异值分解;

相似文献

中文文献
外文文献
专利

1. 改进地标点采样的加速谱聚类算法 [J] . 徐航帆 ,刘丛 ,唐坚刚 . 电子科技 . 2021,第005期
2. 基于地标表示的联合谱嵌入和谱旋转的谱聚类算法 [J] . 李鹏 ,刘力军 ,黄永东 . 计算机科学 . 2021,第0z1期
3. 基于加权集成Nyström采样的谱聚类算法 [J] . 邱云飞 ,刘畅 . 模式识别与人工智能 . 2019,第005期
4. 基于误差采样的Nystr(o)m谱聚类图像分割算法研究 [J] . 刘仲民 ,李博皓 ,李战明 . 无线电工程 . 2017,第004期
5. 基于自适应Nyström采样的大数据谱聚类算法 [J] . 丁世飞 ,贾洪杰 ,史忠植 . 软件学报 . 2014,第009期
6. 基于钻孔采样数据的快速自适应IDW插值算法及应用 [C] . SUN Li-ming ,孙黎明 ,WEI Ying-qi . 第十三届全国土力学及岩土工程学术大会 . 2018
7. 基于协同进化和谱聚类的大规模数据集快速聚类方法研究 [A] . 李阳 . 2014

基于快速地标采样的大规模谱聚类算法

摘要

著录项

相似文献

相关主题

期刊订阅