首页> 外文会议>Pacific-Asia conference on knowledge discovery and data mining >o-HETM: An Online Hierarchical Entity Topic Model for News Streams
【24h】

o-HETM: An Online Hierarchical Entity Topic Model for News Streams

机译:o-HETM:用于新闻流的在线分层实体主题模型

获取原文

摘要

Nowadays, with the development of the Internet, large amount of continuous streaming news has become overwhelming to the public. Constructing a dynamic topic hierarchy which organizes the news articles according to multi-grain topics can enable the users to catch whatever they are interested in as soon as possible. However, it is nontrivial due to the streaming and time-sensitive characteristics of news data. In this paper, to address the challenges, we propose a Hierarchical Entity Topic Model (HETM) which considers the timeliness of news data and the importance of named entities in conveying information of who/when/where in news articles. In addition, we propose online HETM (o-HETM) by presenting a fast online inference algorithm for HETM to adapt it to streaming news. For better understanding of topics, we extract key sentences for each topic to form a summary. Extensive experimental results demonstrate that our model HETM significantly improves the topic quality and time efficiency, compared to state-of-the-art method HLDA (Hierarchical Latent Dirichlet Allocation). In addition, our proposed o-HETM with an online inference algorithm further greatly improves the time efficiency and thus can be applicable to the streaming news.
机译:如今,随着Internet的发展,大量的连续流新闻变得不堪重负。构建动态主题层次结构以根据多粒度主题组织新闻文章可以使用户尽快捕获他们感兴趣的任何内容。但是,由于新闻数据的流式传输和对时间敏感的特性,因此这是不平凡的。在本文中,为了解决挑战,我们提出了一种分层实体主题模型(HETM),该模型考虑了新闻数据的及时性以及命名实体在传达新闻文章中谁/何时/何地的信息中的重要性。另外,我们提出了一种在线HETM(o-HETM),方法是为HETM提供一种快速的在线推理算法,以使其适应流媒体新闻。为了更好地理解主题,我们提取每个主题的关键句子以形成摘要。大量的实验结果表明,与最新方法HLDA(分层潜在Dirichlet分配)相比,我们的模型HETM显着提高了主题质量和时间效率。另外,我们提出的带有在线推理算法的o-HETM可以大大提高时间效率,因此可以应用于流媒体新闻。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号