首页> 外文会议>IEEE International Conference on Big Data Computing Service and Applications >Modeling Both Coarse-Grained and Fine-Grained Topics in Massive Text Data
【24h】

Modeling Both Coarse-Grained and Fine-Grained Topics in Massive Text Data

机译:对海量文本数据中的粗粒度和细粒度主题进行建模

获取原文

摘要

Topic model has attracted much attention from investigators, as it provides users with insights into the huge volumes of documents. However, most previous related studies that based on Non-negative Matrix Factorization (NMF) neglect to figure out which topics are widespread in the documents and which are not. These widespread topics, which we refer to coarse-grained topics, have great significance for people who concentrate on common topics in a given text set. For example, after reading the massive job ads, the jobseekers are eager to learn employers' basic requirements which can be regarded as the coarse-grained topics, as well as the additional requirements that can be deemed to be the fine-grained topics. In this paper, we propose a novel method which applies two different sparseness constraints to NMF to tell coarse-grained topics and fine-grained topics apart. The experimental results of demonstrate that the new model can not only discover coarse-grained topics but also extract fine-grained topics. We evaluate the performance of the new model via text clustering and classification, and the results show the new model can learn more accurate topic representations of documents.
机译:主题模型吸引了研究者的极大关注,因为它为用户提供了对大量文档的见解。但是,大多数以前基于非负矩阵分解(NMF)的相关研究都忽略了哪些主题在文档中广泛存在,哪些没有。这些广泛使用的主题(我们指的是粗粒度主题)对于专注于给定文本集中的常见主题的人们具有重要意义。例如,求职者在阅读了大量的招聘广告后,急于学习雇主的基本要求(可以被视为粗粒度的主题)以及其他要求(可以视为细粒度的主题)。在本文中,我们提出了一种新颖的方法,该方法将两种不同的稀疏约束应用于NMF,以区分粗粒度主题和细粒度主题。实验结果表明,新模型不仅可以发现粗粒度的主题,还可以提取细粒度的主题。我们通过文本聚类和分类来评估新模型的性能,结果表明新模型可以学习更准确的文档主题表示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号