首页> 外文期刊>Information Processing & Management >Sub-story detection in Twitter with hierarchical Dirichlet processes
【24h】

Sub-story detection in Twitter with hierarchical Dirichlet processes

机译:Twitter中具有分层Dirichlet流程的子故事检测

获取原文
获取原文并翻译 | 示例

摘要

Social media has now become the de facto information source on real world events. The challenge, however, due to the high volume and velocity nature of social media streams, is in how to follow all posts pertaining to a given event over time - a task referred to as story detection. Moreover, there are often several different stories pertaining to a given event, which we refer to as sub-stories and the corresponding task of their automatic detection - as sub-story detection. This paper proposes hierarchical Dirichlet processes (HDP), a probabilistic topic model, as an effective method for automatic sub-story detection. HDP can learn sub-topics associated with sub-stories which enables it to handle subtle variations in sub-stories. It is compared with state-of-the-art story detection approaches based on locality sensitive hashing and spectral clustering. We demonstrate the superior performance of HDP for sub-story detection on real world Twitter data sets using various evaluation measures. The ability of HDP to learn sub-topics helps it to recall the sub-stories with high precision. This has resulted in an improvement of up to 60% in the F-score performance of HDP based sub-story detection approach compared to standard story detection approaches. A similar performance improvement is also seen using an information theoretic evaluation measure proposed for the sub-story detection task. Another contribution of this paper is in demonstrating that considering the conversational structures within the Twitter stream can bring up to 200% improvement in sub-story detection performance.
机译:社交媒体现已成为现实世界事件的事实信息源。但是,由于社交媒体流的高容量和高速度特性,面临的挑战是如何随着时间的推移跟踪与给定事件有关的所有帖子-这是一个称为故事检测的任务。此外,通常有几个与给定事件相关的不同故事,我们将它们称为子故事,并将其称为自动检测的相应任务-称为子故事检测。本文提出了一种概率主题模型-分层狄利克雷过程(HDP),作为一种有效的自动子故事检测方法。 HDP可以学习与子故事相关的子主题,从而使其能够处理子故事中的细微变化。它与基于局部敏感哈希和频谱聚类的最新故事检测方法进行了比较。我们展示了HDP在使用各种评估方法对真实世界Twitter数据集进行子故事检测方面的出色性能。 HDP学习子主题的能力有助于其高精度地回忆子故事。与标准故事检测方法相比,基于HDP的子故事检测方法的F评分性能提高了60%。使用为子故事检测任务建议的信息理论评估方法,也可以看到类似的性能改进。本文的另一个贡献是表明,考虑Twitter流中的会话结构可以使子故事检测性能提高200%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号