首页> 外文会议>International conference on smart computing and communication >The Accuracy of Fuzzy C-Means in Lower-Dimensional Space for Topic Detection
【24h】

The Accuracy of Fuzzy C-Means in Lower-Dimensional Space for Topic Detection

机译:低维空间中模糊C均值的主题检测精度

获取原文

摘要

Topic detection is an automatic method to discover topics in textual data. The standard methods of the topic detection are nonnegative matrix factorization (NMF) and latent Dirichlet allocation (LDA). Another alternative method is a clustering approach such as a k-means and fuzzy c-means (FCM). FCM extend the k-means method in the sense that the textual data may have more than one topic. However, FCM works well for low-dimensional textual data and fails for high-dimensional textual data. An approach to overcome the problem is transforming the textual data into lower dimensional space, i.e., Eigenspace, and called Eigenspace-based FCM (EFCM). Firstly, the textual data are transformed into an Eigenspace using truncated singular value decomposition. FCM is performed on the eigenspace data to identify the memberships of the textual data in clusters. Using these memberships, we generate topics from the high dimensional textual data in the original space. In this paper, we examine the accuracy of EFCM for topic detection. Our simulations show that EFCM results in the accuracies between the accuracies of LDA and NMF regarding both topic interpretation and topic recall.
机译:主题检测是一种在文本数据中发现主题的自动方法。主题检测的标准方法是非负矩阵分解(NMF)和潜在Dirichlet分配(LDA)。另一种替代方法是聚类方法,例如k均值和模糊c均值(FCM)。 FCM在文本数据可能包含多个主题的意义上扩展了k-means方法。但是,FCM适用于低维文本数据,而不适用于高维文本数据。解决该问题的一种方法是将文本数据转换到低维空间,即本征空间,并称为基于本征空间的FCM(EFCM)。首先,使用截断的奇异值分解将文本数据转换为特征空间。对特征空间数据执行FCM,以识别群集中文本数据的成员资格。使用这些成员资格,我们可以从原始空间中的高维文本数据中生成主题。在本文中,我们检查了EFCM用于主题检测的准确性。我们的仿真表明,EFCM导致LDA和NMF的精度介于主题解释和主题回忆方面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号