【24h】

A Scalable Spectral Clustering Algorithm Based on Landmark-Embedding and Cosine Similarity

机译:基于地标嵌入和余弦相似度的可扩展谱聚类算法

获取原文

摘要

We extend our recent work on scalable spectral clustering with cosine similarity (ICPR'18) to other kinds of similarity functions, in particular, the Gaussian RBF. In the previous work, we showed that for sparse or low-dimensional data, spectral clustering with the cosine similarity can be implemented directly through efficient operations on the data matrix such as elementwise manipulation, matrix-vector multiplication and low-rank SVD, thus completely avoiding the weight matrix. For other similarity functions, we present an embedding-based approach that uses a small set of landmark points to convert the given data into sparse feature vectors and then applies the scalable computing framework for the cosine similarity. Our algorithm is simple to implement, has clear interpretations, and naturally incorporates an outliers removal procedure. Preliminary results show that our proposed algorithm yields higher accuracy than existing scalable algorithms while running fast.
机译:我们将最近的工作扩展到余弦相似度(ICPR'18)到其他类型的相似性功能,特别是高斯RBF。在上一项工作中,我们表明,对于稀疏或低维数据,可以通过在数据矩阵上的有效操作(如COUNTSWISE操作,矩阵 - 向量乘法和低秩SVD)上直接实现与余弦相似度的光谱聚类。避免重量矩阵。对于其他相似性功能,我们介绍了一种基于嵌入的方法,它使用一小组地标点来将给定数据转换为稀疏特征向量,然后应用于余弦相似度的可伸缩计算框架。我们的算法易于实施,具有清晰的解释,自然地融合了异常删除程序。初步结果表明,我们所提出的算法在快速运行时比现有的可扩展算法产生更高的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号