首页> 外文期刊>Information Processing & Management >A novel multi-view clustering approach via proximity-based factorization targeting structural maintenance and sparsity challenges for text and image categorization
【24h】

A novel multi-view clustering approach via proximity-based factorization targeting structural maintenance and sparsity challenges for text and image categorization

机译:通过基于邻近的基于临近的分子化的新型多视距聚类方法,其针对文本和图像分类的结构维护和稀疏挑战

获取原文
获取原文并翻译 | 示例
       

摘要

Multi-view data contains a set of features representing different perspectives associated with the same data and this phenomenon can be commonly observed in real-world applications. Multi-view clustering in terms of text and image data faces substantial challenges such as Structure-preserving and Sparsity. Existing methods do not conserve the structure of data space and the recent improvements have earmarked only the local layout. Preserving the local structure of data space is not sufficient to handle sparsity in these data. In this paper, we propose a novel clustering approach, called Proximity-based Multi-View Non-negative Matrix Factorization (PMVNMF), which utilizes both the local and global structure of data space conjointly to handle sparsity in real-world multimedia (text and image) data. For each view, the 1-step and 2-step transition probability matrices as the first-order and second-order proximity matrices are constructed to uncover their respective latent local and global geometric structures. Then, view-specific proximity matrices as an integration of the above two types of proximity matrices are constructed. Eventually, Non-negative Matrix Factorization (NMF) is explored via graph regularization and consensus regularization, to consider the obtained integrated graph structures as well as to disclose the indistinct common structure shared by all representations. The algorithm can capture elementary structure of data space and is robust to sparse data. We conduct experiments on six real-world datasets including two text and four image datasets; and compare the performance of the proposed algorithm with eight baseline approaches. Six evaluation metrics including accuracy, f-score, precision, recall, NMI, and entropy are employed to evaluate the performance of algorithm. The results show the outperformance of proposed algorithm over baselines.
机译:多视图数据包含一组表示与相同数据相关联的不同透视图的特征,并且可以在现实世界应用中常见地观察到这种现象。在文本和图像数据方面的多视图聚类面临着大量的挑战,例如结构保存和稀疏性。现有方法不节省数据空间的结构,最近的改进仅占用本地布局。保留数据空间的本地结构不足以处理这些数据中的稀疏性。在本文中,我们提出了一种新的聚类方法,称为基于近似的多视图非负矩阵分解(PMVnMF),其利用了数据空间的本地和全局结构,以处理现实世界多媒体中的稀疏性(文本和图像)数据。对于每个视图,构造了作为一阶和二阶邻近矩阵的1步和2步转换概率矩阵以揭示它们各自的潜在局部和全局几何结构。然后,构造了特定于视图特定的接近矩阵作为上述两种类型的接近矩阵的集成。最终,通过图形正规化和共识正常化探索非负矩阵分解(NMF),以考虑获得的集成图结构,并披露所有表示共享的模糊常见结构。该算法可以捕获数据空间的基本结构,并且对稀疏数据具有稳健性。我们在六个现实世界数据集中进行实验,包括两个文本和四个图像数据集;并比较具有八个基线方法的提议算法的性能。六项评估指标包括准确性,F分,精度,召回,NMI和熵都用于评估算法的性能。结果表明,所提出的算法在基线上的表现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号