首页> 外文会议>Conference on empirical methods in natural language processing >GraphBTM: Graph Enhanced Autoencoded Variational Inference for Biterm Topic Model
【24h】

GraphBTM: Graph Enhanced Autoencoded Variational Inference for Biterm Topic Model

机译:GraphBTM:针对双项主题模型的图形增强型自动编码变分推理

获取原文

摘要

Discovering the latent topics within texts has been a fundamental task for many applications. However, conventional topic models suffer different problems in different settings. The Latent Dirichlet Allocation (LDA) may not work well for short texts due to the data sparsity (i.e.. the sparse word co-occurrence patterns in short documents). The Biterm Topic Model (BTM) learns topics by modeling the word-pairs named biterms in the whole corpus. This assumption is very strong when documents are long with rich topic information and do not exhibit the transitivity of biterms. In this paper, we propose a novel way called GraphBTM to represent biterms as graphs and design Graph Convolutional Networks (GCNsi with residual connections to extract transitive features from biterms. To overcome the data sparsity of LDA and the strong assumption of BTM, we sample a fixed number of documents to form a mini-corpus as a training instance. We also propose a dataset called All News extracted from (Thompson, 2017), in which documents are much longer than 20 Newsgroups. We present an amortized variational inference method for GraphBTM. Our method generates more coherent topics compared with previous approaches. Experiments show that the sampling strategy improves performance by a large margin.
机译:在文本中发现潜在主题一直是许多应用程序的基本任务。但是,常规主题模型在不同的设置下会遇到不同的问题。由于数据稀疏性(即短文档中的稀疏单词共现模式),潜在的Dirichlet分配(LDA)可能不适用于短文本。 Biterm主题模型(BTM)通过对整个语料库中名为biterms的单词对进行建模来学习主题。当文档篇幅长,主题信息丰富且不具有双向术语的传递性时,此假设将非常有力。在本文中,我们提出了一种新颖的方法GraphBTM来将双项表示为图,并设计了具有残差连接的Graph Convolutional Networks(GCNsi)以从双项中提取可传递特征。为克服LDA的数据稀疏性和BTM的强假设,我们对固定数量的文档以形成一个微型语料库作为训练实例。我们还提出了一个数据集All News from(Thompson,2017),其中文档远大于20个新闻组。与以前的方法相比,我们的方法产生了更连贯的主题,实验表明,采样策略可以极大地提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号