Differential Topic Models

Chen C.; Buntine W.; Ding N.; Xie L.; Du L.

首页> 外文期刊>Pattern Analysis and Machine Intelligence, IEEE Transactions on >Differential Topic Models

【24h】

Differential Topic Models

机译：差异主题模型

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In applications we may want to different document collections: they could have shared content but also different and unique aspects in particular collections. This task has been called comparative text mining or cross-collection modeling. We present a for this application that models both topic differences and similarities. For this we use hierarchical Bayesian nonparametric models. Moreover, we found it was important to properly model power-law phenomena in topic-word distributions and thus we used the full Pitman-Yor process rather than just a Dirichlet process. Furthermore, we propose the transformed Pitman-Yor process (TPYP) to incorporate prior knowledge such as vocabulary variations in different collections into the model. To deal with the non-conjugate issue between model prior and likelihood in the TPYP, we thus propose an efficient sampling algorithm using a data augmentation technique based on the multinomial theorem. Experimental results show the model discovers interesting aspects of different collections. We also show the proposed MCMC based algorithm achieves a dramatically reduced test perplexity compared to some existing topic models. Finally, we show our model outperforms the state-of-the-art for document classification/ideology prediction on a number of text collections.

机译：在应用程序中，我们可能想要不同的文档集合：它们可能具有共享的内容，但在特定的集合中可能具有不同且独特的方面。此任务称为比较文本挖掘或交叉收集建模。我们为该应用程序提供了一个模型，该模型同时对主题差异和相似性进行建模。为此，我们使用分层贝叶斯非参数模型。此外，我们发现在主题词分布中正确建模幂律现象非常重要，因此我们使用了完整的Pitman-Yor过程，而不仅仅是Dirichlet过程。此外，我们提出了经过改进的Pitman-Yor过程（TPYP），以将先验知识（例如不同集合中的词汇变化）纳入模型。为了处理TPYP中模型先验与似然之间的非共轭问题，因此，我们基于多项式定理，提出了一种使用数据增强技术的有效采样算法。实验结果表明，该模型发现了不同馆藏的有趣方面。我们还显示，与某些现有主题模型相比，基于MCMC的算法可以显着降低测试的复杂性。最后，我们展示了我们的模型在许多文本集合上的性能优于最新的文档分类/意识形态预测。

著录项

来源
《Pattern Analysis and Machine Intelligence, IEEE Transactions on》 |2015年第2期|230-242|共13页
作者
Chen C.; Buntine W.; Ding N.; Xie L.; Du L.;
展开▼
作者单位

, Australian National University and National ICT, Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bayes methods; Correlation; Data models; Indexes; TV; Vectors; Vocabulary; Differential topic model; MCMC; data augmentation; transformed Pitman-Yor process;

机译：贝叶斯方法;相关性;数据模型;索引;电视;向量;词汇;差异主题模型;MCMC;数据扩充;变换的Pitman-Yor过程;

相似文献

外文文献
中文文献
专利

1. Gendered Associations of Decision-Making Power, Topic Avoidance, and Relational Satisfaction: A Differential Influence Model [J] . Timothy R. Worley, Jennifer A. Samp Communication Reports . 2016,第1a3期

机译：决策力，主题回避和关系满意度的性别协会：差异影响模型
2. Modeling topic control to detect influence in conversations using nonparametric topic models [J] . Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik, Machine Learning . 2014,第3期

机译：使用非参数主题模型对主题控件进行建模以检测对话中的影响
3. Combine Topic Modeling with Semantic Embedding: Embedding Enhanced Topic Model [J] . Zhang Peng, Wang Suge, Li Deyu, IEEE Transactions on Knowledge and Data Engineering . 2020,第12期

机译：组合主题建模与语义嵌入：嵌入增强主题模型
4. Exploring Differential Topic Models for Comparative Summarization of Scientific Papers [C] . Lei He, Wei Li, Hai Zhuge International conference on computational linguistics . 2016

机译：探索差异主题模型以进行科学论文的比较总结
5. Cop Topics: Topic Modeling-Assisted Discoveries of Police-Related Themes in African-American Journalistic Texts. [D] . Lemire Garlic, Nicole. 2017

机译：缔约方会议主题：非裔美国人新闻文本中主题建模辅助的警察相关主题的发现。
6. Discovering Health Topics in Social Media Using Topic Models [O] . Michael J. Paul, Mark Dredze -1

机译：使用主题模型在社交媒体中发现健康主题
7. Differential topic models [O] . Chen, Changyou, Buntine, Wray, Ding, Nan, 2015

机译：差异主题模型

Differential Topic Models

摘要

著录项

相似文献

相关主题

期刊订阅