Modeling Both Coarse-Grained and Fine-Grained Topics in Massive Text Data

机译：在大规模文本数据中建模粗粒度和细粒度的主题

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Topic model has attracted much attention from investigators, as it provides users with insights into the huge volumes of documents. However, most previous related studies that based on Non-negative Matrix Factorization (NMF) neglect to figure out which topics are widespread in the documents and which are not. These widespread topics, which we refer to coarse-grained topics, have great significance for people who concentrate on common topics in a given text set. For example, after reading the massive job ads, the jobseekers are eager to learn employers' basic requirements which can be regarded as the coarse-grained topics, as well as the additional requirements that can be deemed to be the fine-grained topics. In this paper, we propose a novel method which applies two different sparseness constraints to NMF to tell coarse-grained topics and fine-grained topics apart. The experimental results of demonstrate that the new model can not only discover coarse-grained topics but also extract fine-grained topics. We evaluate the performance of the new model via text clustering and classification, and the results show the new model can learn more accurate topic representations of documents.

机译：主题模型吸引了来自调查人员的许多关注，因为它为用户提供了洞察力的巨大文件。然而，最先前的相关研究，即基于非负矩阵分组（NMF）忽略，以弄清楚哪些主题在文件中普遍存在，而不是。这些我们指的是粗大主题的广泛主题对专注于给定文本集中的共同主题的人来说具有重要意义。例如，在阅读大规模的职位广告之后，求职者渴望学习雇主的基本要求，可以被视为粗大的主题，以及可以被视为细粒度主题的额外要求。在本文中，我们提出了一种新的方法，将两种不同的稀疏约束应用于NMF，以告诉粗粒粒度和细粒度的话题。实验结果表明新模型不仅可以发现粗粒粒度，还可以提取细粒度的主题。我们通过文本群集和分类评估新模型的性能，结果显示新模型可以了解文档的更准确的主题表示。

著录项

来源
《IEEE International Conference on Big Data Computing Service and Applications》|2015年||共6页
会议地点
作者
Zhang Weifan; Zhang Hui; Zuo Yuan; Wang Deqing;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP212;
关键词
non-negative matrix factorization; text clustering; text mining; topic model;

机译：非负矩阵分解;文本聚类;文本挖掘;主题模型;

相似文献

外文文献
中文文献
专利

1. Social Media Text Data Visualization Modeling: A Timely Topic Score Technique [J] . Zhenhuan Sui American Journal of Management Science and Engineering . 2019,第3期

机译：社交媒体文本数据可视化建模：及时的主题评分技术
2. Joint sentiment/topic modeling on text data using a boosted restricted Boltzmann Machine [J] . Fatemi Masoud, Safayani Mehran Multimedia Tools and Applications . 2019,第15期

机译：使用增强型受限玻尔兹曼机对文本数据进行情感/主题联合建模
3. Joint sentiment/topic modeling on text data using a boosted restricted Boltzmann Machine [J] . Fatemi Masoud, Safayani Mehran Multimedia Tools and Applications . 2019,第15期

机译：使用提升限制的Boltzmann机器在文本数据上联合情绪/主题建模
4. Modeling Both Coarse-Grained and Fine-Grained Topics in Massive Text Data [C] . Zhang Weifan, Zhang Hui, Zuo Yuan, IEEE International Conference on Big Data Computing Service and Applications . 2015

机译：对海量文本数据中的粗粒度和细粒度主题进行建模
5. Two topics: A jackknife maximum likelihood approach to statistical model selection, and, A convex hull peeling depth approach to nonparametric massive multivariate data analysis with applications. [D] . Lee, Hyunsook. 2006

机译：两个主题：用于统计模型选择的折刀最大似然方法，以及用于非参数大规模多元数据分析的凸壳剥离深度方法及其应用。
6. Topic models: A novel method for modeling couple and family text data [O] . David C. Atkins, Tim N. Rubin, Mark Steyvers, -1

机译：主题模型：一种模拟夫妇和家庭文本数据的新方法
7. Topic cube: Topic modeling for olap on multidimensional text databases [O] . Duo Zhang, Chengxiang Zhai, Jiawei Han 2009

机译：主题立方体：多维文本数据库上的olap主题建模

Modeling Both Coarse-Grained and Fine-Grained Topics in Massive Text Data

摘要

著录项

相似文献

相关主题

期刊订阅