首页> 外文期刊>Journal of Information Science >LDA-based online topic detection using tensor factorization
【24h】

LDA-based online topic detection using tensor factorization

机译:张量分解的基于LDA的在线主题检测

获取原文
获取原文并翻译 | 示例
           

摘要

In the information retrieval field, effective and efficient extraction of topics from large-scale online text streams is challenging because it is a fully unsupervised learning task without prior knowledge. Most previous studies have focused on how to analyse text corpus to extract topics, rarely considering time dimensions. In the present study, we approached topic detection as a temporal optimization problem. Here, we propose a novel approach to incremental topic detection, called online topic detection using tensor factorization (OTD-TF), which is based on latent Dirichlet allocation (LDA). First, topics are obtained from the corpus in current time slices using LDA. Second, a topic tensor with a time dimension is constructed to identify the correlations between pairs of topics. Then, approximate topics are merged using TF. Finally, documents are reallocated to corresponding topic bins. By executing these steps continuously and incrementally, temporal topic detection can be achieved. In theoretical analyses and simulation experiments, OTD-TF outperformed other systems in terms of space and time complexity and achieved a high precision ratio. Our experimental evaluations also revealed interesting temporal patterns in topic emergence, development, extinction, burst and transience.
机译:在信息检索领域,从大规模在线文本流中有效,高效地提取主题具有挑战性,因为这是一项完全无监督的学习任务,没有先验知识。以前的大多数研究都集中于如何分析文本语料库以提取主题,而很少考虑时间维度。在本研究中,我们将主题检测作为时间优化问题。在这里,我们提出了一种新的增量主题检测方法,称为基于张量因子分解(OTD-TF)的在线主题检测,该方法基于潜在狄利克雷分配(LDA)。首先,使用LDA从当前时间片的语料库中获取主题。其次,构造具有时间维度的主题张量以识别主题对之间的相关性。然后,使用TF合并近似主题。最后,将文档重新分配到相应的主题箱。通过连续不断地执行这些步骤,可以实现时间主题检测。在理论分析和仿真实验中,OTD-TF在空间和时间复杂度方面均优于其他系统,并获得了很高的精度。我们的实验评估还揭示了主题出现,发展,消亡,爆发和短暂的有趣时间模式。

著录项

  • 来源
    《Journal of Information Science》 |2013年第4期|459-469|共11页
  • 作者单位

    Department of Computer Science and Technology, Tongji University, No. 4800, Caoan Rd, Shanghai 201804, China Department of Computer Science and Technology and The Key Laboratory of Embedded System and Services Computing, Ministry of Education, Tongji University, China;

    Department of Computer Science and Technology and The Key Laboratory of Embedded System and Services Computing, Ministry of Education, Tongji University, China;

    Key Laboratory of Computational Intelligence and Chinese Information Processing of Ministry of Education, School of Computer and Information Technology, Shanxi University, Taiyuan, China;

    Department of Computer Science and Technology and The Key Laboratory of Embedded System and Services Computing, Ministry of Education, Tongji University, China;

    Department of Computer Science and Technology and The Key Laboratory of Embedded System and Services Computing, Ministry of Education, Tongji University, China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    LDA; tensor factorization; topic detection; topic tensor;

    机译:LDA;张量分解话题检测;主题张量;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号