首页> 外文会议>Pacific-Asia Conference on Knowledge Discovery and Data Mining >Learning Focused Hierarchical Topic Models with Semi-Supervision in Microblogs
【24h】

Learning Focused Hierarchical Topic Models with Semi-Supervision in Microblogs

机译:学习聚焦分层主题模型,在微博中半监督

获取原文

摘要

Topic modeling approaches, such as Latent Dirichlet Allocation (LDA) and Hierarchical LDA (hLDA) have been used extensively to discover topics in various corpora. Unfortunately, these approaches do not perform well when applied to collections of social media posts. Further, these approaches do not allow users to focus topic discovery around subjectively interesting concepts. We propose the new Semi-Supervised Microblog-hLDA (SS-Micro-hLDA) model to discover topic hierarchies in short, noisy microblog documents in a way that allows users to focus topic discovery around interesting areas. We test SS-Micro-hLDA using a large, public collection of Twitter messages and Reddit social blogging site and show that our model outperforms hLDA, Constrained-hLDA, Recursive-rCRP and TSSB in terms of Pointwise Mutual Information (PMI) Score. Further, we test our model in terms of information entropy of held-out data and show that the new approach produces highly focused topic hierarchies.
机译:主题建模方法,例如潜在的Dirichlet分配(LDA)和分层LDA(HLDA)已广泛用于发现各种语料的主题。不幸的是,这些方法在应用于社交媒体帖子的集合时,这些方法并不符合良好。此外,这些方法不允许用户在主观有趣的概念周围专注于主题发现。我们提出了新的半监督微博-HLDA(SS-Micro-HLDA)模型,以发现短嘈杂的微博文档的主题层次结构,以便用户允许用户对焦于有趣区域的主题发现。我们使用大型公共的Twitter消息和Reddit Social Blogging站点测试SS-Micro-HLDA,并显示我们的模型以叉点互信息(PMI)得分而胜过HLDA,约束 - HLDA,RECUSUSIVE-RCRP和TSSB。此外,我们在列出数据的信息熵方面测试我们的模型,并显示新方法产生高度集中的主题层次结构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号