首页> 外文会议>International conference on computational linguistics >Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs
【24h】

Integrating Topic Modeling with Word Embeddings by Mixtures of vMFs

机译:通过vMF的混合将主题建模与单词嵌入集成

获取原文

摘要

Gaussian LDA integrates topic modeling with word embeddings by replacing discrete topic distribution over word types with multivariate Gaussian distribution on the embedding space. This can take semantic information of words into account. However, the Euclidean similarity used in Gaussian topics is not an optimal semantic measure for word embeddings. Acknowledgedly, the cosine similarity better describes the semantic relatedness between word embeddings. To employ the cosine measure and capture complex topic structure, we use von Mises-Fisher (vMF) mixture models to represent topics, and then develop a novel mix-vMF topic model (MvTM). Using public pre-trained word embeddings, we evaluate MvTM on three real-world data sets. Experimental results show that our model can discover more coherent topics than the state-of-the-art baseline models, and achieve competitive classification performance.
机译:高斯LDA通过用嵌入空间上的多元高斯分布代替单词类型上的离散主题分布,将主题建模与单词嵌入集成在一起。这可以考虑单词的语义信息。但是,高斯主题中使用的欧几里得相似度并不是词嵌入的最佳语义度量。众所周知,余弦相似度更好地描述了词嵌入之间的语义相关性。为了使用余弦测量并捕获复杂的主题结构,我们使用冯·米塞斯·费舍(v Miss-Fisher)(vMF)混合模型来表示主题,然后开发一种新颖的混合-vMF主题模型(MvTM)。使用公开的预训练单词嵌入,我们在三个真实的数据集上评估MvTM。实验结果表明,与最新的基线模型相比,我们的模型可以发现更多连贯的主题,并且可以实现竞争性的分类性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号