首页> 外文会议>IEEE International Conference on Data Mining Workshops >OLLDA: A Supervised and Dynamic Topic Mining Framework in Twitter
【24h】

OLLDA: A Supervised and Dynamic Topic Mining Framework in Twitter

机译:OLLDA:Twitter中的监督和动态主题挖掘框架

获取原文

摘要

Analyzing media in real-time is of great importance with social media platforms at the epicenter of crunching, digesting and disseminating content to individuals connected to these platforms. Within this context, topic models, specially LDA, have gained strong momentum due to their scalability, inference power and their compact semantics. Although, state of the art topic models come short in handling streaming large chunks of data arriving dynamically onto the platform, thus hindering their quality of interpretation as well as their adaptability to information overload. As a result, in this manuscript we propose for a labelled and online extension to LDA (OLLDA), which incorporates supervision through external labeling and capability of quickly digesting real-time updates thus making it more adaptive to Twitter and platforms alike. Our proposed extension has capability of handling large quantities of newly arrived documents in a stream, and at the same time, is capable of achieving high topic inference quality given the short and often sloppy text of tweets. Our approach mainly uses an approximate inference technique based on variational inference coupled with a labeled LDA model. We conclude by presenting experiments using a one year crawl of Twitter data that shows significantly improved topical inference as well as temporal user profile classification when compared to state of the art baselines.
机译:实时分析媒体对社交媒体平台非常重要,因为社交媒体平台将内容整理,消化和分发给与这些平台相关的个人。在这种情况下,主题模型,尤其是LDA,由于其可伸缩性,推理能力和紧凑的语义而获得了强劲的发展。虽然,最新的主题模型在处理动态流到达平台上的大数据流时不足,因此阻碍了它们的解释质量以及对信息过载的适应性。因此,在本手稿中,我们建议对LDA(OLLDA)进行标记和在线扩展,该扩展包含通过外部标记进行监督和快速摘要实时更新的功能,从而使其更适用于Twitter和平台。我们建议的扩展功能具有处理流中大量新到达的文档的能力,同时,由于推文文本简短且经常草率,因此能够实现较高的主题推断质量。我们的方法主要使用基于变分推理和标记LDA模型的近似推理技术。我们通过使用一年的Twitter数据抓取来进行实验来得出结论,该抓取显示出与现有技术水平的基线相比,主题推断以及时态用户配置文件分类得到了显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号