首页> 中文期刊> 《计算机工程》 >基于词嵌入与概率主题模型的社会媒体话题识别

基于词嵌入与概率主题模型的社会媒体话题识别

         

摘要

词嵌入技术能从大语料库中捕获词语的语义信息,将其与概率主题模型结合可解决标准主题模型缺乏语义信息的问题.为此,同时对词嵌入和主题模型进行改进,构建词-主题混合模型.在主题词嵌入(TWE)模型中引入外部语料库获得初始主题和单词表示,通过定义主题向量和词嵌入的条件概率分布,将词嵌入特征表示和主题向量集成到主题模型中,同时最小化新词-主题分布函数和原始词-主题分布函数的KL散度.实验结果表明,与Word2vec、TWE、LDA和LFLDA模型相比,该模型在词表示和主题检测方面性能更好.%Word embedding can capture the semantic information of words from the large corpus,and its combination with the probabilistic topic model can solve the problem of lack of semantic information in the standard topic model.So in this paper,Word-Topic Mixture (WTM) model is proposed to improve word representation and topic model simultaneously.Firstly,external corpus is introduced into the Topic Word Embedding (TWE) model to get the initial topic and word representation.Then the word embedding feature representation and topic vector are integrated in the topic model by redefining the probability conditional distribution of topic vectors and word embedding,meanwhile the KL divergence of the new word-topic distribution function and the original distribution function are minimized.Experimental results prove that the WTM model performs better on word representation and topic detection compared with Word2vec,TWE,Latent Dirichlet Allocation(LDA) and LFLDA model.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号