A neural topic model with word vectors and entity vectors for short texts

Xiaowei Zhao; Deqing Wang; Zhengyang Zhao; Wei Liu; Chenwei Lu; FuzhenZhuang

首页> 外文期刊>Information Processing & Management >A neural topic model with word vectors and entity vectors for short texts

【24h】

A neural topic model with word vectors and entity vectors for short texts

机译：具有单词向量和实体向量的神经主题模型，短文本

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Traditional topic models are widely used for semantic discovery from long texts. However, they usually fail to mine high-quality topics from short texts (e.g. tweets) due to the sparsity of features and the lack of word co-occurrence patterns. In this paper, we propose a Variational Auto-Encoder Topic Model (VAETM for short) by combining word vector representation and entity vector representation to address the above limitations. Specifically, we first learn embedding representations of each word and each entity by employing a large-scale external corpora and a large and manually edited knowledge graph, respectively. Then we integrated the embedding representations into the variational auto-encoder framework and propose an unsupervised model named VAETM to infer the latent representation of topic distributions. To further boost VAETM, we propose an improved supervised VAETM (SVAETM for short) by considering label information in training set to supervise the inference of latent representation of topic distributions and the generation of topics. Last, we propose KL-divergence-based inference algorithms to infer approximate posterior distribution for our two models. Extensive experiments on three common short text datasets demonstrate our proposed VAETM and SVAETM outperform various kinds of state-of-the-art models in terms of perplexity, NPMI, and accuracy.

机译：传统主题模型被广泛用于从长文本的语义发现。然而，他们通常无法从短文矿山高品质的主题（如微博）因特征的稀疏和缺乏字共生模式。在本文中，我们通过组合字向量表示和实体矢量表示，以解决上述限制提出了一种变自动编码器主题模型（VAETM的简称）。具体地讲，我们首先学习通过采用大型外部语料库和大的和手动编辑知识图表嵌入每个单词和每个实体的表示，分别。然后，我们集成嵌入表示成变自动编码器的结构，并提出命名VAETM无人监督的模式来推断主题分布的潜在表现。为了进一步提升VAETM，我们建议考虑在训练集标签信息监督的话题分布的潜在代表性的推理和主题的产生改进监督VAETM（SVAETM的简称）。最后，我们提出KL散度为基础的推理算法来推断近似的后验分布为我们的两款车型。在三种常见短文本数据集大量的实验证明我们提出的VAETM和SVAETM跑赢各种国家的最先进机型的困惑，NPMI和准确性方面。

著录项

来源
《Information Processing & Management》 |2021年第2期|102455.1-102455.11|共11页
作者
Xiaowei Zhao; Deqing Wang; Zhengyang Zhao; Wei Liu; Chenwei Lu; FuzhenZhuang;
展开▼
作者单位

School of Computer Science Beihang University Beijing 100191 China;

School of Computer Science Beihang University Beijing 100191 China;

School of Computer Science Beihang University Beijing 100191 China;

National Computer Network Emergency Response Technical Team/Coordination Center of China Beijing 100029 China;

School of Computer Science Beihang University Beijing 100191 China;

Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS) Institute of Computing Technology CAS Beijing 100190 China Xiamen Data Intelligence Academy of ICT CAS China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Topic model; Short text; Variational auto-encoder; Word embedding; Entity embedding;

机译：主题模型;短文本;变形式自动编码器;嵌入词;实体嵌入;
入库时间 2022-08-19 01:56:07

相似文献

外文文献
中文文献
专利

1. Improving short text classification by learning vector representations of both words and hidden topics [J] . Zhang Heng, Zhong Guoqiang Knowledge-Based Systems . 2016,第juna15期

机译：通过学习单词和隐藏主题的向量表示来改善短文本分类
2. Emotional analysis of short text based on LDA three-way decision mixed topics vector model [J] . Wang Dexin, Tang Kuiyu, Wang Limin Basic & clinical pharmacology & toxicology. . 2020,第S9期

机译：基于LDA三路决策混合主题矢量模型的短文本情感分析
3. Emotional analysis of short text based on LDA three-way decision mixed topics vector model [J] . Wang Dexin, Tang Kuiyu, Wang Limin Basic & clinical pharmacology & toxicology. . 2019,第S1期

机译：基于LDA三路决策混合主题矢量模型的短文本情感分析
4. Vector Representation of Words for Detecting Topic Trends over Short Texts [C] . Liyan He, Yajun Du, Lei Zhang International Conference on Mathematics, Modelling, Simulation and Algorithms . 2018

机译：导航侦查主题趋势的词的传染媒介表示在短篇文本
5. Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding [D] . Ju, Yiting. 2017

机译：事物和字符串和更多：通过组合实体共同发生，主题建模和单词嵌入来改善从短文本的歧义
6. WET: Word embedding-topic distribution vectors for MOOC video lectures dataset [O] . Zenun Kastrati, Arianit Kurti, Ali Shariq Imran 2020

机译：WET：MOOC视频讲座数据集的词嵌入主题分布向量
7. Vector Representation of Words for Detecting Topic Trends over Short Texts [O] . Liyan He, Yajun Du, Lei Zhang 2018

机译：导航侦查主题趋势的词的传染媒介表示在短篇文本

A neural topic model with word vectors and entity vectors for short texts

摘要

著录项

相似文献

相关主题

期刊订阅