首页> 外文会议>IEEE International Conference on Big Data Computing Service and Applications >Modeling Both Coarse-Grained and Fine-Grained Topics in Massive Text Data
【24h】

Modeling Both Coarse-Grained and Fine-Grained Topics in Massive Text Data

机译:在大规模文本数据中建模粗粒度和细粒度的主题

获取原文

摘要

Topic model has attracted much attention from investigators, as it provides users with insights into the huge volumes of documents. However, most previous related studies that based on Non-negative Matrix Factorization (NMF) neglect to figure out which topics are widespread in the documents and which are not. These widespread topics, which we refer to coarse-grained topics, have great significance for people who concentrate on common topics in a given text set. For example, after reading the massive job ads, the jobseekers are eager to learn employers' basic requirements which can be regarded as the coarse-grained topics, as well as the additional requirements that can be deemed to be the fine-grained topics. In this paper, we propose a novel method which applies two different sparseness constraints to NMF to tell coarse-grained topics and fine-grained topics apart. The experimental results of demonstrate that the new model can not only discover coarse-grained topics but also extract fine-grained topics. We evaluate the performance of the new model via text clustering and classification, and the results show the new model can learn more accurate topic representations of documents.
机译:主题模型吸引了来自调查人员的许多关注,因为它为用户提供了洞察力的巨大文件。然而,最先前的相关研究,即基于非负矩阵分组(NMF)忽略,以弄清楚哪些主题在文件中普遍存在,而不是。这些我们指的是粗大主题的广泛主题对专注于给定文本集中的共同主题的人来说具有重要意义。例如,在阅读大规模的职位广告之后,求职者渴望学习雇主的基本要求,可以被视为粗大的主题,以及可以被视为细粒度主题的额外要求。在本文中,我们提出了一种新的方法,将两种不同的稀疏约束应用于NMF,以告诉粗粒粒度和细粒度的话题。实验结果表明新模型不仅可以发现粗粒粒度,还可以提取细粒度的主题。我们通过文本群集和分类评估新模型的性能,结果显示新模型可以了解文档的更准确的主题表示。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号