首页> 外文会议>IEEE International Conference on Data Science in Cyberspace >Research on Improve Topic Representation over Short Text
【24h】

Research on Improve Topic Representation over Short Text

机译:改进短文本主题表示的研究

获取原文

摘要

According to the characteristics of sparseness, poor focus and lack of semantic information in short text, the existing studies mainly improved the topic representation from two aspects: Adjusting the structure of the topic model to increase word co-occurrence and incorporating word embedding to enrich the semantic information. In this paper, we review the existing topic representation methods for short text from these two aspects, select DMM, BTM and LF-DMM as targets and compare the quality of topic representations from them. Considering the traditional topic model does not incorporate word embedding, and single short text often lacks context information due to length limitation, we try to combine it with word embedding, we use two real-world datasets to evaluate the quality of topic representation in the classification task: Although the LF-DMM model incorporates word embedding, it performs poorly on short text, and the performance of DMM and BTM integrated with word embedding improve greatly.
机译:针对短文本稀疏,关注焦点集中,语义信息缺乏的特点,现有研究主要从两个方面对主题表示进行了改进:调整主题模型的结构以增加单词共现性,并结合单词嵌入以丰富主题词的表达。语义信息。在本文中,我们从这两个方面回顾了现有的短文本主题表示方法,选择了DMM,BTM和LF-DMM作为目标,并从中比较了主题表示的质量。考虑到传统主题模型不包含词嵌入,并且单个短文本由于长度限制而经常缺少上下文信息,因此我们尝试将其与词嵌入结合使用,我们使用两个真实世界的数据集来评估分类中主题表示的质量任务:尽管LF-DMM模型集成了单词嵌入功能,但是它在短文本上的性能很差,并且集成了单词嵌入功能的DMM和BTM的性能得到了很大的提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号