首页> 外文期刊>Information Processing & Management >A neural topic model with word vectors and entity vectors for short texts
【24h】

A neural topic model with word vectors and entity vectors for short texts

机译:具有单词向量和实体向量的神经主题模型,短文本

获取原文
获取原文并翻译 | 示例
       

摘要

Traditional topic models are widely used for semantic discovery from long texts. However, they usually fail to mine high-quality topics from short texts (e.g. tweets) due to the sparsity of features and the lack of word co-occurrence patterns. In this paper, we propose a Variational Auto-Encoder Topic Model (VAETM for short) by combining word vector representation and entity vector representation to address the above limitations. Specifically, we first learn embedding representations of each word and each entity by employing a large-scale external corpora and a large and manually edited knowledge graph, respectively. Then we integrated the embedding representations into the variational auto-encoder framework and propose an unsupervised model named VAETM to infer the latent representation of topic distributions. To further boost VAETM, we propose an improved supervised VAETM (SVAETM for short) by considering label information in training set to supervise the inference of latent representation of topic distributions and the generation of topics. Last, we propose KL-divergence-based inference algorithms to infer approximate posterior distribution for our two models. Extensive experiments on three common short text datasets demonstrate our proposed VAETM and SVAETM outperform various kinds of state-of-the-art models in terms of perplexity, NPMI, and accuracy.
机译:传统主题模型被广泛用于从长文本的语义发现。然而,他们通常无法从短文矿山高品质的主题(如微博)因特征的稀疏和缺乏字共生模式。在本文中,我们通过组合字向量表示和实体矢量表示,以解决上述限制提出了一种变自动编码器主题模型(VAETM的简称)。具体地讲,我们首先学习通过采用大型外部语料库和大的和手动编辑知识图表嵌入每个单词和每个实体的表示,分别。然后,我们集成嵌入表示成变自动编码器的结构,并提出命名VAETM无人监督的模式来推断主题分布的潜在表现。为了进一步提升VAETM,我们建议考虑在训练集标签信息监督的话题分布的潜在代表性的推理和主题的产生改进监督VAETM(SVAETM的简称)。最后,我们提出KL散度为基础的推理算法来推断近似的后验分布为我们的两款车型。在三种常见短文本数据集大量的实验证明我们提出的VAETM和SVAETM跑赢各种国家的最先进机型的困惑,NPMI和准确性方面。

著录项

  • 来源
    《Information Processing & Management》 |2021年第2期|102455.1-102455.11|共11页
  • 作者单位

    School of Computer Science Beihang University Beijing 100191 China;

    School of Computer Science Beihang University Beijing 100191 China;

    School of Computer Science Beihang University Beijing 100191 China;

    National Computer Network Emergency Response Technical Team/Coordination Center of China Beijing 100029 China;

    School of Computer Science Beihang University Beijing 100191 China;

    Key Lab of Intelligent Information Processing of Chinese Academy of Sciences (CAS) Institute of Computing Technology CAS Beijing 100190 China Xiamen Data Intelligence Academy of ICT CAS China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Topic model; Short text; Variational auto-encoder; Word embedding; Entity embedding;

    机译:主题模型;短文本;变形式自动编码器;嵌入词;实体嵌入;
  • 入库时间 2022-08-19 01:56:07

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号