首页> 外文期刊>Information Processing & Management >A novel regularized asymmetric non-negative matrix factorization for text clustering
【24h】

A novel regularized asymmetric non-negative matrix factorization for text clustering

机译:文本聚类的新颖正常非对称非负矩阵分解

获取原文
获取原文并翻译 | 示例
       

摘要

Non-negative matrix factorization (NMF) is a dimension reduction method that extracts semantic features from high-dimensional data. Most of the developed optimization methods for NMF only pay attention to how each feature vector of factorized matrices should be modeled, and ignore the relationships among feature vectors. Such a relationship among documents' feature vectors provides better factorization for text clustering. This paper proposes a novel regularized asymmetric non-negative matrix factorization (RANMF) for text clustering. The proposed method puts regularized constraints on pairwise feature vectors by applying penalties using distance-based measures. We design a new cost function based on the Kullback-Leibler divergence and develop an optimization scheme to solve the cost function by suggesting novel multiplicative updating rules. The proposed method considers the documents from the same cluster closely together in the new representation space. Hence, the acquired parts-based representation has consistent cluster labeling with the original space and has a more discriminating ability. The complexity analysis showed that RANMF does not increase time cost by applying regularizes when comparing with the original NMF. Regarding experiments, the proposed RANMF converges very fast because it terminates in less than ten iterations. The complete proof of convergence and experimental results on the benchmark data sets demonstrate that the proposed multiplicative updating rules converge fast and achieve superior results compared to other algorithms.
机译:非负矩阵分解(NMF)是一种尺寸还原方法,其从高维数据中提取语义特征。对于NMF的大多数开发的优化方法仅关注应如何建模分解矩阵的每个特征向量,并忽略特征向量之间的关系。文档中的这种关系的特征向量提供了更好的文本聚类分解。本文提出了一种新的正则化非对称非负矩阵分解(RANMF),用于文本群集。该方法通过使用基于距离的措施施加惩罚来对成对特征向量进行正则化限制。我们根据Kullback-Leibler发散设计一种新的成本函数,并开发优化方案,通过建议新的乘法更新规则来解决成本函数。所提出的方法将文件与同一群集中的文件紧密结合在一起,在新的表示空间中。因此,所获得的基于部分的表示具有与原始空间的一致聚类标记,并且具有更具辨别的能力。复杂性分析表明,与原始NMF相比,RANMF不会通过在与原始NMF进行比较时通过正规化来增加时间成本。关于实验,所提出的RANMF会收敛非常迅速,因为它终止于不到十个迭代。基准数据集的完整收敛性和实验结果证明,与其他算法相比,所提出的乘法更新规则会聚快速并实现优越的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号