Adaptive Window Strategy for Topic Modeling in Document Streams

机译：文档流中主题建模的自适应窗口策略

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Extracting global themes from a written text has recently become a major issue for computational intelligence, in particular in Natural Language Processing communities. Among all proposed solutions, Latent Dirichlet Allocation (LDA) has gained a vast interest and several variants have been proposed to adapt to changing environments. With the emergence of data streams, for instance from social media, the domain faces a new challenge: topic extraction in real time. In this paper, we propose a simple approach called Adaptive Window based Incremental LDA (AWILDA) originating from the cross-over between LDA and state-of-the-art methods in data stream mining. We train new topic models only when a drift is detected and select training data on the fly using ADWIN algorithm. We provide both theoretical guarantees for our method and experimental validation on artificial and real-world data.

机译：从书面文本中提取全局主题最近已成为计算智能的主要问题，尤其是在自然语言处理社区中。在所有提出的解决方案中，潜在狄利克雷分配（LDA）引起了极大的兴趣，并且已经提出了多种变体来适应不断变化的环境。随着来自社交媒体等数据流的出现，该领域面临着新的挑战：实时提取主题。在本文中，我们提出了一种简单的方法，称为基于自适应窗口的增量LDA（AWILDA），该方法源自LDA与数据流挖掘中的最新方法之间的交叉。我们仅在检测到漂移时训练新的主题模型，并使用ADWIN算法即时选择训练数据。我们为我们的方法提供了理论上的保证，并为人工和现实世界的数据提供了实验验证。

著录项

来源
《International Joint Conference on Neural Networks》|2018年|1-7|共7页
会议地点
作者
Pierre-Alexandre Murena; Marie Al-Ghossein; Talel Abdessalem; Antoine Cornuéjols;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Adaptation models; Computational modeling; Microsoft Windows; Resource management; Data mining; Task analysis; Data models;

机译：适应模型;计算建模; Microsoft Windows;资源管理;数据挖掘;任务分析;数据模型;

相似文献

外文文献
中文文献
专利

1. Topic modeling for sequential documents based on hybrid inter-document topic dependency [J] . Li Wenbo, Saigo Hiroto, Tong Bin, Journal of Intelligent Information Systems . 2021,第3期

机译：基于混合文档主题依赖性的顺序文档主题建模
2. Adaptive and hybrid context-aware fine-grained word sense disambiguation in topic modeling based document representation [J] . Wenbo Li, Einoshin Suzuki Information Processing & Management . 2021,第4期

机译：基于主题建模的文档表示中的自适应和混合上下文感知细粒度歧义歧义
3. Web Information Visualization Method Employing Immune Network Model for Finding Topic Stream from Document-set Sequence [J] . Yasufumi TAKAMA, Kaoru HIROTA New Generation Computing . 2003,第1期

机译：利用免疫网络模型从文档集序列中查找主题流的Web信息可视化方法
4. Adaptive Window Strategy for Topic Modeling in Document Streams [C] . Pierre-Alexandre Murena, Talel Abdessalem, Marie Al-Ghossein, International Joint Conference on Neural Networks . 2018

机译：文档流中主题建模的自适应窗口策略
5. Topic models and dynamic prediction models and their applications in document retrieval and healthcare. [D] . Caballero Barajas, Karla L. 2015

机译：主题模型和动态预测模型及其在文档检索和医疗保健中的应用。
6. Reducing False Negative Reads in RFID Data Streams Using an Adaptive Sliding-Window Approach [O] . Libe Valentine Massawe, Johnson D. M. Kinyua, Herman Vermaak 2012

机译：使用自适应滑动窗口方法减少RFID数据流中的假阴性读取
7. Online LDA: Adaptive Topic Model for Mining Text Streams with Application on Topic Detection and [O] . Loulwah Alsumait, Daniel Barbará, Carlotta Domeniconi 2008

机译：在线LDa：用于挖掘文本流的自适应主题模型及其在主题检测和应用中的应用
8. Computing Diameter in the Streaming and Sliding-Window Models (Preprint) [R] . Feigenbaum, J. , Kannan, S. , Zhang, J. 2002

机译：在流媒体和滑动窗口模型中计算直径（预印本）

Adaptive Window Strategy for Topic Modeling in Document Streams

摘要

著录项

相似文献

相关主题

期刊订阅