首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Combine Topic Modeling with Semantic Embedding: Embedding Enhanced Topic Model
【24h】

Combine Topic Modeling with Semantic Embedding: Embedding Enhanced Topic Model

机译:组合主题建模与语义嵌入:嵌入增强主题模型

获取原文
获取原文并翻译 | 示例

摘要

Topic model and word embedding reflect two perspectives of text semantics. Topic model maps documents into topic distribution space by utilizing word collocation patterns within and across documents, while word embedding represents words within a continuous embedding space by exploiting the local word collocation patterns in context windows. Clearly, these two types of patterns are complementary. In this paper, we propose a novel integration framework to combine the two representation methods, where topic information can be transmitted into corresponding semantic embedding structure. Based on this framework, we construct a Embedding Enhanced Topic Model (EETM), which can improve topic modeling and generate topic embeddings by leveraging the word embedding. Extensive experimental results show that EETM can learn high-quality document representations for common text analysis tasks across multiple data sets, indicating it is very effective for merging topic models with word embeddings.
机译:主题模型和Word嵌入反映了文本语义的两个透视图。主题模型通过利用文档内部和跨文档中的单词搭配模式将文档映射到主题分配空间,而单词嵌入代表连续嵌入空间内的单词,通过在上下文窗口中利用本地字搭配模式来表示连续的嵌入空间。显然,这两种模式是互补的。在本文中,我们提出了一种新的集成框架来组合两个表示方法,其中主题信息可以传输到相应的语义嵌入结构中。基于此框架,我们构建一个嵌入增强主题模型(EETM),可以通过利用嵌入单词来改善主题建模和生成主题嵌入式。广泛的实验结果表明,EETM可以在多个数据集中学习用于常见文本分析任务的高质量文档表示,表明它对于使用Word Embeddings合并主题模型非常有效。

著录项

  • 来源
    《IEEE Transactions on Knowledge and Data Engineering》 |2020年第12期|2322-2335|共14页
  • 作者单位

    Shanxi Univ Finance & Econ Sch Informat Taiyuan 030006 Peoples R China;

    Shanxi Univ Sch Comp & Informat Technol Taiyuan 030006 Peoples R China|Shanxi Univ Minist Educ Key Lab Computat Intelligence & Chinese Informat Taiyuan 030006 Peoples R China;

    Shanxi Univ Sch Comp & Informat Technol Taiyuan 030006 Peoples R China|Shanxi Univ Minist Educ Key Lab Computat Intelligence & Chinese Informat Taiyuan 030006 Peoples R China;

    ASTAR Inst Infocomm Res Singapore 138632 Singapore;

    Shanghai Univ Sch Comp Engn & Sci Shanghai 200444 Peoples R China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Topic model; word embedding; topical embedding; representation learning;

    机译:主题模型;单词嵌入;局部嵌入;代表学习;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号