Parsimonious Topic Models with Salient Word Discovery

Soleimani H.; Miller D.J.

首页> 外文期刊>Knowledge and Data Engineering, IEEE Transactions on >Parsimonious Topic Models with Salient Word Discovery

【24h】

Parsimonious Topic Models with Salient Word Discovery

机译：具有显着词发现功能的简约主题模型

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

We propose a parsimonious topic model for text corpora. In related models such as Latent Dirichlet Allocation (LDA), all words are modeled topic-specifically, even though many words occur with similar frequencies across different topics. Our modeling determines salient words for each topic, which have topic-specific probabilities, with the rest explained by a universal shared model. Further, in LDA all topics are in principle present in every document. By contrast, our model gives sparse topic representation, determining the (small) subset of relevant topics for each document. We derive a Bayesian Information Criterion (BIC), balancing model complexity and goodness of fit. Here, interestingly, we identify an effective sample size and corresponding penalty specific to each parameter type in our model. We minimize BIC to jointly determine our entire model—the topic-specific words, document-specific topics, all model parameter values, and the total number of topics—in a wholly unsupervised fashion. Results on three text corpora and an image dataset show that our model achieves higher test set likelihood and better agreement with ground-truth class labels, compared to LDA and to a model designed to incorporate sparsity.

机译：我们为文本语料库提出了一个简约主题模型。在诸如潜在狄利克雷分配（LDA）之类的相关模型中，所有单词都是针对特定主题建模的，即使许多单词在不同主题之间的出现频率相似。我们的建模确定每个主题的显着单词，这些单词具有特定主题的概率，其余的则由通用共享模型解释。此外，LDA中的所有主题原则上都存在于每个文档中。相比之下，我们的模型给出的主题稀疏，确定了每个文档相关主题的（小）子集。我们推导了贝叶斯信息准则（BIC），可以平衡模型的复杂性和拟合优度。在这里，有趣的是，我们确定了模型中每种参数类型特定的有效样本量和相应的惩罚。我们将BIC最小化，以完全不受监管的方式共同确定我们的整个模型-特定于主题的单词，特定于文档的主题，所有模型参数值以及主题的总数。在三个文本语料库和一个图像数据集上的结果表明，与LDA和旨在合并稀疏性的模型相比，我们的模型具有更高的测试集可能性以及与真实类标签的一致性。

著录项

来源
《Knowledge and Data Engineering, IEEE Transactions on》 |2015年第3期|824-837|共14页
作者
Soleimani H.; Miller D.J.;
展开▼
作者单位

Department of Electrical Engineering, Pennsylvania State University, University Park, PA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Approximation methods; Bayes methods; Biological system modeling; Complexity theory; Computational modeling; Data models; Linear programming; Bayesian information criterion (BIC); model selection; parsimonious models; sparse models; topic models; unsupervised feature selection;

机译：近似方法;贝叶斯方法;生物系统建模;复杂性理论;计算模型;数据模型;线性规划;贝叶斯信息准则（BIC）;模型选择;简约模型;稀疏模型;主题模型;无监督特征选择;

相似文献

外文文献
中文文献
专利

1. Tracking topic evolution via salient keyword matching with consideration of semantic broadness for Web video discovery [J] . Harakawa Ryosuke, Ogawa Takahiro, Haseyama Miki Multimedia Tools and Applications . 2018,第16期

机译：考虑到Web视频发现的语义广泛性，通过突出关键词匹配跟踪主题演变
2. Topic representation: Finding more representative words in topic models [J] . Chi Jinjin, Ouyang Jihong, Li Changchun, Pattern recognition letters . 2019,第MAY期

机译：主题表示：在主题模型中查找更多具有代表性的单词
3. Tackling topic general words in topic modeling [J] . Yueshen Xu, Yuyu Yin, Jianwei Yin Engineering Applications of Artificial Intelligence . 2017,第Juna期

机译：在主题建模中处理主题通用词
4. Intensity of Relationship Between Words: Using Word Triangles in Topic Discovery for Short Texts [C] . Ming Xu, Yang Cai, Hesheng Wu, Aisa-Pacific Web and Web-Age Information Management Joint Conference on Web and Big Data . 2017

机译：单词之间的关系强度：在短文本主题发现中使用单词三角形
5. Structured topic models: Jointly modeling words and their accompanying modalities. [D] . Wang, Xuerui. 2009

机译：结构化主题模型：联合建模单词及其伴随的方式。
6. Improved Parsimonious Topic Modeling Based on the Bayesian Information Criterion [O] . Hang Wang, David Miller 2020

机译：基于贝叶斯信息标准的改进了解析主题建模
7. 1 Parsimonious Topic Models with Salient Word Discovery [O] . Hossein Soleimani, David J. Miller 2016

机译：1个具有显着词发现的parsimonious主题模型

Parsimonious Topic Models with Salient Word Discovery

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅