GraphBTM: Graph Enhanced Autoencoded Variational Inference for Biterm Topic Model

机译：GraphBTM：针对双项主题模型的图形增强型自动编码变分推理

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Discovering the latent topics within texts has been a fundamental task for many applications. However, conventional topic models suffer different problems in different settings. The Latent Dirichlet Allocation (LDA) may not work well for short texts due to the data sparsity (i.e.. the sparse word co-occurrence patterns in short documents). The Biterm Topic Model (BTM) learns topics by modeling the word-pairs named biterms in the whole corpus. This assumption is very strong when documents are long with rich topic information and do not exhibit the transitivity of biterms. In this paper, we propose a novel way called GraphBTM to represent biterms as graphs and design Graph Convolutional Networks (GCNsi with residual connections to extract transitive features from biterms. To overcome the data sparsity of LDA and the strong assumption of BTM, we sample a fixed number of documents to form a mini-corpus as a training instance. We also propose a dataset called All News extracted from (Thompson, 2017), in which documents are much longer than 20 Newsgroups. We present an amortized variational inference method for GraphBTM. Our method generates more coherent topics compared with previous approaches. Experiments show that the sampling strategy improves performance by a large margin.

机译：在文本中发现潜在主题一直是许多应用程序的基本任务。但是，常规主题模型在不同的设置下会遇到不同的问题。由于数据稀疏性（即短文档中的稀疏单词共现模式），潜在的Dirichlet分配（LDA）可能不适用于短文本。 Biterm主题模型（BTM）通过对整个语料库中名为biterms的单词对进行建模来学习主题。当文档篇幅长，主题信息丰富且不具有双向术语的传递性时，此假设将非常有力。在本文中，我们提出了一种新颖的方法GraphBTM来将双项表示为图，并设计了具有残差连接的Graph Convolutional Networks（GCNsi）以从双项中提取可传递特征。为克服LDA的数据稀疏性和BTM的强假设，我们对固定数量的文档以形成一个微型语料库作为训练实例。我们还提出了一个数据集All News from（Thompson，2017），其中文档远大于20个新闻组。与以前的方法相比，我们的方法产生了更连贯的主题，实验表明，采样策略可以极大地提高性能。

著录项

来源
《Conference on empirical methods in natural language processing》|2018年|4663-4672|共10页
会议地点
作者
Qile Zhu; Zheng Feng; Xiaolin Li;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Two time-efficient gibbs sampling inference algorithms for biterm topic model [J] . Zhou Xiaotang, Ouyang Jihong, Li Ximing Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2018,第3期

机译：Biterm主题模型的两个Quey-Questive Gibbs采样推理算法
2. Stochastic Variational Inference for Dynamic Correlated Topic Models [J] . Federico Tomasi, Praveen Chandar, Gal Levy-Fix, JMLR: Workshop and Conference Proceedings . 2020,第2010期

机译：动态相关主题模型的随机变分推理
3. Empirical study on variational inference methods for topic models [J] . Chi Jinjin, Ouyang Jihong, Li Ximing, Journal of Experimental and Theoretical Artificial Intelligence . 2018,第1期

机译：主题模型变分推理方法的实证研究
4. GraphBTM: Graph Enhanced Autoencoded Variational Inference for Biterm Topic Model [C] . Qile Zhu, Zheng Feng, Xiaolin Li Conference on empirical methods in natural language processing . 2018

机译：GraphBtm：Graph增强了Biterm主题模型的自动频率变分推理
5. Some Topics in High-Dimensional Robust Inference and Graphical Modeling [D] . Song, Youngseok. 2021

机译：高维强度推理和图形建模的一些主题
6. A representation learning model based on variational inference and graph autoencoder for predicting lncRNA-disease associations [O] . Zhuangwei Shi, Han Zhang, Chen Jin, 2021

机译：基于变分性推断和图形自身额相预测LNCRNA疾病关联的表示学习模型
7. A Two-Stepped Feature Engineering Process for Topic Modeling using Batchwise LDA with Stochastic Variational Inference Model [O] . Sujatha Kokatnoor, Balachandran Krishnan 2020

机译：具有带有随机变分推理模型的Batchwise LDA主题建模的两步特征工程过程

GraphBTM: Graph Enhanced Autoencoded Variational Inference for Biterm Topic Model

摘要

著录项

相似文献

相关主题

期刊订阅