Modeling Both Coarse-Grained and Fine-Grained Topics in Massive Text Data

机译：对海量文本数据中的粗粒度和细粒度主题进行建模

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Topic model has attracted much attention from investigators, as it provides users with insights into the huge volumes of documents. However, most previous related studies that based on Non-negative Matrix Factorization (NMF) neglect to figure out which topics are widespread in the documents and which are not. These widespread topics, which we refer to coarse-grained topics, have great significance for people who concentrate on common topics in a given text set. For example, after reading the massive job ads, the jobseekers are eager to learn employers' basic requirements which can be regarded as the coarse-grained topics, as well as the additional requirements that can be deemed to be the fine-grained topics. In this paper, we propose a novel method which applies two different sparseness constraints to NMF to tell coarse-grained topics and fine-grained topics apart. The experimental results of demonstrate that the new model can not only discover coarse-grained topics but also extract fine-grained topics. We evaluate the performance of the new model via text clustering and classification, and the results show the new model can learn more accurate topic representations of documents.

机译：主题模型吸引了研究者的极大关注，因为它为用户提供了对大量文档的见解。但是，大多数以前基于非负矩阵分解（NMF）的相关研究都忽略了哪些主题在文档中广泛存在，哪些没有。这些广泛使用的主题（我们指的是粗粒度主题）对于专注于给定文本集中的常见主题的人们具有重要意义。例如，求职者在阅读了大量的招聘广告后，急于学习雇主的基本要求（可以被视为粗粒度的主题）以及其他要求（可以视为细粒度的主题）。在本文中，我们提出了一种新颖的方法，该方法将两种不同的稀疏约束应用于NMF，以区分粗粒度主题和细粒度主题。实验结果表明，新模型不仅可以发现粗粒度的主题，还可以提取细粒度的主题。我们通过文本聚类和分类来评估新模型的性能，结果表明新模型可以学习更准确的文档主题表示。

著录项

来源
《IEEE International Conference on Big Data Computing Service and Applications》|2015年|378-383|共6页
会议地点
作者
Zhang Weifan; Zhang Hui; Zuo Yuan; Wang Deqing;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
non-negative matrix factorization; text clustering; text mining; topic model;

机译：非负矩阵分解;文本聚类;文本挖掘;主题模型;

相似文献

外文文献
中文文献
专利

1. Social Media Text Data Visualization Modeling: A Timely Topic Score Technique [J] . Zhenhuan Sui American Journal of Management Science and Engineering . 2019,第3期

机译：社交媒体文本数据可视化建模：及时的主题评分技术
2. Joint sentiment/topic modeling on text data using a boosted restricted Boltzmann Machine [J] . Fatemi Masoud, Safayani Mehran Multimedia Tools and Applications . 2019,第15期

机译：使用增强型受限玻尔兹曼机对文本数据进行情感/主题联合建模
3. Joint sentiment/topic modeling on text data using a boosted restricted Boltzmann Machine [J] . Fatemi Masoud, Safayani Mehran Multimedia Tools and Applications . 2019,第15期

机译：使用提升限制的Boltzmann机器在文本数据上联合情绪/主题建模
4. Modeling Both Coarse-Grained and Fine-Grained Topics in Massive Text Data [C] . Zhang Weifan, Zhang Hui, Zuo Yuan, IEEE International Conference on Big Data Computing Service and Applications . 2015

机译：在大规模文本数据中建模粗粒度和细粒度的主题
5. Two topics: A jackknife maximum likelihood approach to statistical model selection, and, A convex hull peeling depth approach to nonparametric massive multivariate data analysis with applications. [D] . Lee, Hyunsook. 2006

机译：两个主题：用于统计模型选择的折刀最大似然方法，以及用于非参数大规模多元数据分析的凸壳剥离深度方法及其应用。
6. Topic models: A novel method for modeling couple and family text data [O] . David C. Atkins, Tim N. Rubin, Mark Steyvers, -1

机译：主题模型：一种模拟夫妇和家庭文本数据的新方法
7. Topic cube: Topic modeling for olap on multidimensional text databases [O] . Duo Zhang, Chengxiang Zhai, Jiawei Han 2009

机译：主题立方体：多维文本数据库上的olap主题建模

Modeling Both Coarse-Grained and Fine-Grained Topics in Massive Text Data

摘要

著录项

相似文献

相关主题

期刊订阅