...
首页> 外文期刊>Data mining and knowledge discovery >Exemplar-based low-rank matrix decomposition for data clustering
【24h】

Exemplar-based low-rank matrix decomposition for data clustering

机译:基于示例的低秩矩阵分解用于数据聚类

获取原文
获取原文并翻译 | 示例

摘要

Today, digital data is accumulated at a faster than ever speed in science, engineering, biomedicine, and real-world sensing. The ubiquitous phenomenon of massive data and sparse information imposes considerable challenges in data mining research. In this paper, we propose a theoretical framework, Exemplar-based low-rank sparse matrix decomposition (EMD), to cluster large-scale datasets. Capitalizing on recent advances in matrix approximation and decomposition, EMD can partition datasets with large dimensions and scalable sizes efficiently. Specifically, given a data matrix, EMD first computes a representative data subspace and a near-optimal low-rank approximation. Then, the cluster centroids and indicators are obtained through matrix decomposition, in which we require that the cluster centroids lie within the representative data subspace. By selecting the representative exemplars, we obtain a compact "sketch"of the data. This makes the clustering highly efficient and robust to noise. In addition, the clustering results are sparse and easy for interpretation. From a theoretical perspective, we prove the correctness and convergence of the EMD algorithm, and provide detailed analysis on its efficiency, including running time and spatial requirements. Through extensive experiments performed on both synthetic and real datasets, we demonstrate the performance of EMD for clustering large-scale data.
机译:如今,在科学,工程,生物医学和现实感测中,数字数据的存储速度比以往任何时候都要快。海量数据和稀疏信息的普遍现象给数据挖掘研究带来了巨大挑战。在本文中,我们提出了一个理论框架,即基于样本的低秩稀疏矩阵分解(EMD),以对大型数据集进行聚类。借助矩阵近似和分解的最新进展,EMD可以有效地划分具有大尺寸和可伸缩大小的数据集。具体而言,在给定数据矩阵的情况下,EMD首先计算代表性数据子空间和近似最佳的低秩近似。然后,通过矩阵分解获得聚类质心和指标,其中我们要求聚类质心位于代表性数据子空间内。通过选择代表性的样本,我们获得了数据的紧凑“草图”。这使得聚类非常有效并且对噪声具有鲁棒性。此外,聚类结果稀疏且易于解释。从理论上讲,我们证明了EMD算法的正确性和收敛性,并对其效率进行了详细分析,包括运行时间和空间要求。通过在合成数据集和真实数据集上进行的广泛实验,我们证明了EMD对大型数据进行聚类的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号