Dirichlet Process Mixture Models based topic identification for short text streams

机译：基于Dirichlet过程混合模型的短文本流主题识别

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Topic detection and tracking (TDT) has been extensively studied and applied in recent years. However, prior work is mostly based on regular news text, the problem of scaling to short stories remains pretty much open. Besides, prior work conducts topic identification on separated stories by assuming story segmentation as prerequisites, which is another challenging yet critical task for TDT research. In this paper, we propose a Dirichlet Process Mixture Model (DPMM) based topic identification method, which deals with topic segmentation, topic detection and tracking in an unified model, and achieves reasonable results for short stories. We first present DPMM and its application in topic identification task. Then we discuss two different solutions specifically designed to solve sparseness problem associated with short stories. One is the design of algorithm flow. Instead of a single short text, the processing unit of topic identification is converted to session firstly. The other applies extended DPMM model which takes account of word dependency when estimating distributions of words associated with every known topic. Whereafter, we extend DPMM to identify topic for spontaneous text streams by managing topic segmentation, topic detection and tracking simultaneously. The attractive advantage of DPMM is the number of mixture components needs not been fixed in advance, and it does not need prior knowledge about number and content of topics. So compared with other existing methods, it is more suitable for streaming topic identification. Our empirical results on TDT3 evaluation data verify that DPMM is valid in the task of topic identification for short text data with stream properties, and extended DPMM outperforms original DPMM methods.

机译：近年来，主题检测和跟踪（TDT）已被广泛研究和应用。但是，先前的工作主要基于常规新闻文本，因此扩展为短篇小说的问题仍然很悬而未决。此外，先前的工作通过将故事分段作为前提来对分离的故事进行主题识别，这是TDT研究的又一个挑战性但至关重要的任务。在本文中，我们提出了一种基于狄利克雷混合模型（DPMM）的主题识别方法，该方法以统一的模型处理主题分割，主题检测和跟踪，并为短篇小说取得了合理的结果。我们首先介绍DPMM及其在主题识别任务中的应用。然后，我们讨论专门设计用于解决与短故事相关的稀疏问题的两种不同解决方案。一种是算法流程的设计。代替单个短文本，主题识别的处理单元首先被转换为会话。另一个应用扩展的DPMM模型，该模型在估计与每个已知主题关联的单词分布时考虑单词依赖性。之后，我们扩展DPMM以通过同时管理主题分割，主题检测和跟踪来识别自发文本流的主题。 DPMM的吸引人的优点是不需要预先确定混合物成分的数量，也不需要有关主题数量和内容的先验知识。因此，与其他现有方法相比，它更适合于流主题识别。我们在TDT3评估数据上的经验结果验证了DPMM在具有流属性的短文本数据的主题识别任务中是有效的，并且扩展的DPMM优于原始的DPMM方法。

著录项

来源
《7th International Conference on Natural Language Processing and Knowledge Engineering》|2011年|p.80-87|共8页
会议地点 Tokushima(JP)
作者
Wang Chan; Yuan Caixia; Wang Xiaojie; Xue Wenwei;
展开▼
作者单位

Center of Intelligent Science and Technology, Beijing University of Posts and Telecommunications, China;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类信息处理（信息加工）;
关键词
DPMM; Dirichlet Process Mixture Model; data streams; extended DPMM; static short text; topic identification;

机译：DPMM; Dirichlet过程混合模型;数据流;扩展的DPMM;静态短文本;主题识别;

相似文献

外文文献
中文文献
专利

1. A Dirichlet process biterm-based mixture model for short text stream clustering [J] . Applied Intelligence: The International Journal of Artificial Intelligence, Neural Networks, and Complex Problem-Solving Technologies . 2020,第5期

机译：基于Dirichlet处理Biterm的简短文本流群集的混合模型
2. Online Biterm Topic Model based short text stream classification using short text expansion and concept drifting detection [J] . Hu Xuegang, Wang Haiyan, Li Peipei Pattern recognition letters . 2018,第DECa1期

机译：使用短文本扩展和概念漂移检测的基于在线Biterm主题模型的短文本流分类
3. A Dirichlet Multinomial Mixture Model-based Approach for Short Text Clustering [J] . Jianhua Yin, Jianyong Wang SIGKDD explorations . 2014,第CDaROM期

机译：基于Dirichlet多项式混合模型的短文本聚类方法
4. Dirichlet Process Mixture Models based topic identification for short text streams [C] . Wang Chan, Yuan Caixia, Wang Xiaojie, International Conference on Natural Language Processing and Knowledge Engineering . 2011

机译：Dirichlet过程混合模型基于短文本流的主题识别
5. Dirichlet process mixture models for text and video analysis. [D] . Pruteanu-Malinici, Iulian. 2008

机译：Dirichlet混合文本和视频分析模型。
6. Kernel Analysis Based on Dirichlet Processes Mixture Models [O] . Jinkai Tian, Peifeng Yan, Da Huang 2019

机译：基于Dirichlet的内核分析混合模型
7. Dirichlet Multinomial Mixture with Variational Manifold Regularization: Topic Modeling over Short Texts [O] . Ximing Li, Jiaojiao Zhang, Jihong Ouyang 2019

机译：Dirichlet多项式混合物，变分歧正则化：在短文本上建模主题

Dirichlet Process Mixture Models based topic identification for short text streams

摘要

著录项

相似文献

相关主题

期刊订阅