Contextual-LDA: A Context Coherent Latent Topic Model for Mining Large Corpora

机译：上下文LDA：挖掘大型语料库的上下文一致性潜在主题模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Statistical topic models represented by Latent Dirichlet Allocation (LDA) and its variants are ubiquitously applied to understanding large corpora. Meanwhile, topic models based on bag-of-words (Bow) rarely adopt contextual information, which encompasses enormous amount of serviceable knowledge in a document, into the probabilistic framework. This shortcoming of LDA leads to its failing to learn contextual information in sentences and paragraphs. We present a contextual coherent topic model for text learning namely Contextual Latent Dirichlet Allocation (Contextual-LDA) to include the contextual knowledge without increasing the perplexity of the algorithm very much. In our model, a document is segmented into finelydivided word sequences, each corresponded with one distinct latent topic to capture local context, while the global context is obtained by the location a segment appears in the document. We learn parameters using Gibbs sampling analogous to traditional LDA. Our model takes advantage of statistical strength of BoW through extending LDA without ignoring knowledge contained in the original context of documents. We also demonstrate it in supervised scenario. While comparing to LDA model, experiment results on BBC corpus in both unsupervised and supervised settings reveal our method is finely adapted for text mining.

机译：以潜在狄利克雷分配（LDA）及其变体为代表的统计主题模型普遍应用于理解大型语料库。同时，基于单词袋（Bow）的主题模型很少将上下文信息纳入概率框架，上下文信息将文档中包含的大量可服务知识包含在内。 LDA的这一缺点导致它无法学习句子和段落中的上下文信息。我们提出了一种用于文本学习的上下文相关主题模型，即上下文潜在Dirichlet分配（Contextual-LDA），以包括上下文知识，而不会大大增加算法的复杂性。在我们的模型中，文档被细分为细分的单词序列，每个单词序列对应一个不同的潜在主题以捕获局部上下文，而全局上下文则通过片段出现在文档中的位置获得。我们使用类似于传统LDA的Gibbs采样来学习参数。我们的模型通过扩展LDA来利用BoW的统计强度，而不会忽略文档原始上下文中包含的知识。我们还将在有监督的情况下进行演示。与LDA模型相比，在无监督和有监督的情况下，BBC语料库的实验结果表明，我们的方法非常适合文本挖掘。

著录项

来源
《International Conference on Multimedia Big Data》|2016年|420-425|共6页
会议地点
作者
Ding Peng; Dai Guilan; Zhang Yong;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Context modeling; Hidden Markov models; Context; Computational modeling; Adaptation models; Visualization; Resource management;

机译：上下文建模;隐马尔可夫模型;上下文;计算建模;适应模型;可视化;资源管理;

相似文献

外文文献
中文文献
专利

1. Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora [J] . Ivan Vulić, Wim De Smet, Marie-Francine Moens Information Retrieval . 2013,第3期

机译：基于潜在主题模型的跨语言信息检索模型，该主题模型经过与文档对齐的可比语料库训练
2. Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora [J] . Ivan Vulic, Wim De Smet, Marie-Francine Moens Information retrieval . 2013,第3期

机译：基于潜在主题模型的跨语言信息检索模型，该主题模型经过与文档对齐的可比语料库训练
3. A novel fuzzy k-means latent semantic analysis (FKLSA) approach for topic modeling over medical and health text corpora [J] . Rashid Junaid, Shah Syed Muhammad Adnan, Irtaza Aun Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2019,第5aPta2期

机译：关于医疗和健康文本语料库主题建模的新型模糊k型潜在语义分析（FKLSA）方法
4. Contextual-LDA: A Context Coherent Latent Topic Model for Mining Large Corpora [C] . Ding Peng, Dai Guilan, Zhang Yong International Conference on Multimedia Big Data . 2016

机译：Contextual-LDA：采矿大型电池的上下文相干潜在模型
5. Modeling social systems processes found in *text corpora through windowed latent semantic analysis and simulation of concept refreshment events. [D] . Weaver, Christopher Adrian. 2005

机译：通过窗口式潜在语义分析和概念更新事件的模拟，对*文本语料库中发现的社会系统过程进行建模。
6. Mining heterogeneous clinical notes by multi-modal latent topic model [O] . Zhi Wen, Pratheeksha Nair, Chih-Ying Deng, 2021

机译：多模态潜在模型采矿异质临床票据
7. Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora [O] . Vulic Ivan, De Smet Wim, Moens Marie-Francine 2013

机译：基于潜在主题模型的跨语言信息检索模型，该主题模型经过与文档对齐的可比语料库训练

Contextual-LDA: A Context Coherent Latent Topic Model for Mining Large Corpora

摘要

著录项

相似文献

相关主题

期刊订阅