A network approach to topic models

Martin Gerlach; Tiago P. Peixoto; Eduardo G. Altmann

首页> 外文期刊>Science Advances >A network approach to topic models

【24h】

A network approach to topic models

机译：主题模型的网络方法

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the main computational and scientific challenges in the modern age is to extract useful information from unstructured texts. Topic models are one popular machine-learning approach that infers the latent topical structure of a collection of documents. Despite their success—particularly of the most widely used variant called latent Dirichlet allocation (LDA)—and numerous applications in sociology, history, and linguistics, topic models are known to suffer from severe conceptual and practical problems, for example, a lack of justification for the Bayesian priors, discrepancies with statistical properties of real texts, and the inability to properly choose the number of topics. We obtain a fresh view of the problem of identifying topical structures by relating it to the problem of finding communities in complex networks. We achieve this by representing text corpora as bipartite networks of documents and words. By adapting existing community-detection methods (using a stochastic block model (SBM) with nonparametric priors), we obtain a more versatile and principled framework for topic modeling (for example, it automatically detects the number of topics and hierarchically clusters both the words and documents). The analysis of artificial and real corpora demonstrates that our SBM approach leads to better topic models than LDA in terms of statistical model selection. Our work shows how to formally relate methods from community detection and topic modeling, opening the possibility of cross-fertilization between these two fields.

机译：现代时代的主要计算和科学挑战之一是从非结构化文本中提取有用的信息。主题模型是一种流行的机器学习方法，可以推断文档集合的潜在主题结构。尽管获得了成功（特别是最广泛使用的称为潜在狄利克雷分配（LDA）的变体）以及在社会学，历史和语言学中的大量应用，但是已知主题模型会遇到严重的概念和实践问题，例如，缺乏合理性对于贝叶斯先验，真实文本的统计属性存在差异，并且无法正确选择主题数。通过将主题结构与在复杂网络中查找社区的问题相关联，我们获得了一种确定主题结构的新观点。我们通过将文本语料库表示为文档和单词的双向网络来实现。通过改编现有的社区检测方法（使用具有非参数先验的随机块模型（SBM）），我们获得了一种更通用，更原则的主题建模框架（例如，它会自动检测主题的数量并将单词和文件）。对人工和真实语料库的分析表明，就统计模型选择而言，我们的SBM方法比LDA导致更好的主题模型。我们的工作展示了如何将社区检测和主题建模中的方法正式关联起来，从而为这两个领域之间的交叉应用开辟了可能性。

著录项

来源
《Science Advances》 |2018年第7期|共页
作者
Martin Gerlach; Tiago P. Peixoto; Eduardo G. Altmann;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类肿瘤学;
关键词

相似文献

外文文献
中文文献
专利

1. Information Diffusion on Complex Networks: A Novel Approach Based on Topic Modeling and Pretopology Theory [J] . Thi Kim Thoa Ho, Quang Vu Bui, Marc Bui Vietnam Journal of Computer Science . 2019,第3期

机译：复杂网络上的信息扩散：基于主题建模和拓扑学理论的新方法
2. Modeling urban traffic: a cellular automata approach - [Topics in automotive networking] [J] . Tonguz O.K., Viriyasitavat W., Bai F. Communications Magazine, IEEE . 2009,第5期

机译：对城市交通建模：蜂窝自动机方法-[汽车网络主题]
3. Dynamic social network analysis: A novel approach using agent-based model, author-topic model, and pretopology [J] . Ho Thi Kim Thoa, Bui Quang Vu, Bui Marc Concurrency, practice and experience . 2020,第13期

机译：动态社交网络分析：基于代理的模型，作者主题模型和预科学的新方法
4. A Hybrid Classification Approach using Topic Modeling and Graph Convolution Networks [C] . Thoudam Doren Singh, Divyansha Divyansha, Apoorva Vikram Singh, International Conference on Computational Performance Evaluation . 2020

机译：使用主题建模和图卷积网络的混合分类方法
5. Topic models for link prediction in document networks . [D] . Kataria, Saurabh. 2012

机译：文档网络中链接预测的主题模型。
6. A network approach to topic models [O] . Martin Gerlach, Tiago P. Peixoto, Eduardo G. Altmann 2018

机译：主题模型的网络方法
7. Learning topic description from clustering of trusted user roles and event models characterizing distributed provenance networks: a reinforcement learning approach [O] . Sanjoy Kumar Mukherjee, Sivaji Bandyopadhyay 2017

机译：学习主题描述从群集可信用户角色和事件模型，表征分布式出处网络：加强学习方法
8. Technical Topic 3.2.2.d Bayesian and Non-Parametric Statistics: Integration of Neural Networks with Bayesian Networks for Data Fusion and Predictive Modeling. [R] . Bell, S. 2016

机译：技术主题3.2.2.d贝叶斯和非参数统计：神经网络与贝叶斯网络的集成，用于数据融合和预测建模。

A network approach to topic models

摘要

著录项

相似文献

相关主题

期刊订阅