首页> 外文会议>International Joint Conference on Neural Networks >Adaptive Window Strategy for Topic Modeling in Document Streams
【24h】

Adaptive Window Strategy for Topic Modeling in Document Streams

机译:文档流中主题建模的自适应窗口策略

获取原文

摘要

Extracting global themes from a written text has recently become a major issue for computational intelligence, in particular in Natural Language Processing communities. Among all proposed solutions, Latent Dirichlet Allocation (LDA) has gained a vast interest and several variants have been proposed to adapt to changing environments. With the emergence of data streams, for instance from social media, the domain faces a new challenge: topic extraction in real time. In this paper, we propose a simple approach called Adaptive Window based Incremental LDA (AWILDA) originating from the cross-over between LDA and state-of-the-art methods in data stream mining. We train new topic models only when a drift is detected and select training data on the fly using ADWIN algorithm. We provide both theoretical guarantees for our method and experimental validation on artificial and real-world data.
机译:从书面文本中提取全局主题最近已成为计算智能的主要问题,尤其是在自然语言处理社区中。在所有提出的解决方案中,潜在狄利克雷分配(LDA)引起了极大的兴趣,并且已经提出了多种变体来适应不断变化的环境。随着来自社交媒体等数据流的出现,该领域面临着新的挑战:实时提取主题。在本文中,我们提出了一种简单的方法,称为基于自适应窗口的增量LDA(AWILDA),该方法源自LDA与数据流挖掘中的最新方法之间的交叉。我们仅在检测到漂移时训练新的主题模型,并使用ADWIN算法即时选择训练数据。我们为我们的方法提供了理论上的保证,并为人工和现实世界的数据提供了实验验证。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号