Nested variational autoencoder for topic modelling on microtexts with word vectors

Trinh Trung; Quan Tho; Mai Trung

首页> 外文期刊>Expert Systems >Nested variational autoencoder for topic modelling on microtexts with word vectors

【24h】

Nested variational autoencoder for topic modelling on microtexts with word vectors

机译：嵌套变分AutoEncoder主题建模与单词向量的Microtexts

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Most of the information on the Internet is represented in the form ofmicrotexts, which are short text snippets such as news headlines or tweets. These sources of information are abundant, and mining these data could uncover meaningful insights. Topic modelling is one of the popular methods to extract knowledge from a collection of documents; however, conventional topic models such as latent Dirichlet allocation (LDA) are unable to perform well on short documents, mostly due to the scarcity of word co-occurrence statistics embedded in the data. The objective of our research is to create a topic model that can achieve great performances on microtexts while requiring a small runtime for scalability to large datasets. To solve the lack of information of microtexts, we allow our method to take advantage of word embeddings for additional knowledge of relationships between words. For speed and scalability, we apply autoencoding variational Bayes, an algorithm that can perform efficient black-box inference in probabilistic models. The result of our work is a novel topic model called thenested variational autoencoder, which is a distribution that takes into account word vectors and is parameterized by a neural network architecture. For optimization, the model is trained to approximate the posterior distribution of the original LDA model. Experiments show the improvements of our model on microtexts as well as its runtime advantage.

机译：Internet上的大多数信息以MICROTEXTS的形式表示，这是短文本片段，例如新闻标题或推文。这些信息来源丰富，挖掘这些数据可能会发现有意义的见解。主题建模是从文件集合中提取知识的流行方法之一;但是，诸如潜在Dirichlet分配（LDA）之类的常规主题模型在短文档中无法执行良好，主要是由于数据中嵌入的单词共同发生统计信息的稀缺性。我们的研究目的是创建一个主题模型，可以在需要小型运行时对MicroTexts实现很大的性能，以便可扩展到大型数据集。为了解决MicroTexts的缺乏信息，我们允许我们的方法利用Word Embeddings，以便额外了解单词之间的关系。为了速度和可扩展性，我们应用自动编码变分贝斯，这是一个可以在概率模型中执行有效的黑箱推断的算法。我们的作品结果是一个名为Daliational AutoEncoder的新型主题模型，它是考虑字向量的分布，并由神经网络架构参数化。为了优化，培训模型以近似原始LDA模型的后部分布。实验表明我们对Microtexts模型的改进以及其运行时优势。

著录项

来源
《Expert Systems》 |2021年第2期|e12639.1-e12639.27|共27页
作者
Trinh Trung; Quan Tho; Mai Trung;
展开▼
作者单位

Ho Chi Minh City Univ Technol Fac Comp Sci & Engn Ho Chi Minh Vietnam;

Ho Chi Minh City Univ Technol Fac Comp Sci & Engn Ho Chi Minh Vietnam;

Ho Chi Minh City Univ Technol Fac Comp Sci & Engn Ho Chi Minh Vietnam;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
microtext; neural network; topic modelling; variational autoencoder; word embedding;

机译：MicroText;神经网络;主题建模;变形式自动化器;单词嵌入;

相似文献

外文文献
中文文献
专利

1. A neural topic model with word vectors and entity vectors for short texts [J] . Xiaowei Zhao, Deqing Wang, Zhengyang Zhao, Information Processing & Management . 2021,第2期

机译：具有单词向量和实体向量的神经主题模型，短文本
2. A Method for Constructing Supervised Time Topic Model Based on Variational Autoencoder [J] . Zhinan Gou, Yan Li, Zheng Huo Scientific programming . 2021,第a期

机译：一种基于变化自动化器的监督时间主题模型构建方法
3. Decoupling Sparsity and Smoothness in the Dirichlet Variational Autoencoder Topic Model [J] . Sophie Burkhardt, Stefan Kramer Journal of machine learning research . 2019,第a期

机译：Dirichlet变分AutoEncoder主题模型中的稀疏性和平滑度
4. Automatic Labelling of Topic Models Using Word Vectors and Letter Trigram Vectors [C] . Wanqiu Kou, Fang Li, Timothy Baldwin Asia information retrieval societies conference . 2015

机译：使用单词向量和字母Trigram向量自动标记主题模型
5. Variational Autoencoder Based Estimation of Distribution Algorithms and Applications to Individual Based Ecosystem Modeling Using Ecosim [D] . Bhattacharjee, Sourodeep. 2019

机译：基于分析算法的分析算法和应用于各个生态系统建模的分布算法和应用
6. WET: Word embedding-topic distribution vectors for MOOC video lectures dataset [O] . Zenun Kastrati, Arianit Kurti, Ali Shariq Imran 2020

机译：WET：MOOC视频讲座数据集的词嵌入主题分布向量
7. Gaussian Mixture Variational Autoencoder for Semi-Supervised Topic Modeling [O] . Cangqi Zhou, Hao Ban, Jing Zhang, 2020

机译：用于半监控主题建模的高斯混合变分自动化器

Nested variational autoencoder for topic modelling on microtexts with word vectors

摘要

著录项

相似文献

相关主题

期刊订阅