Assessing the Uncertainty of the Text Generating Process Using Topic Models

机译：使用主题模型评估文本生成过程的不确定性

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Latent Dirichlet Allocation (LDA) is one of the most popular topic models employed for the analysis of large text data. When applied repeatedly to the same text corpus, LDA leads to different results. To address this issue, several methods have been proposed. In this paper, instead of dealing with this methodological source of algorithmic uncertainty, we assess the aleatoric uncertainty of the text generating process itself. For this task, we use a direct LDA-model approach to quantify the uncertainty due to the random process of text generation and propose three different bootstrap approaches to resample texts. These allow to construct uncertainty intervals of topic proportions for single texts as well as for text corpora over time. We discuss the differences of the uncertainty intervals derived from the three bootstrap approaches and the direct approach for single texts and for aggregations of texts. We present the results of an application of the proposed methods to an example corpus consisting of all published articles in a German daily quality newspaper of one full year and investigate the effect of different sample sizes to the uncertainty intervals.

机译：潜在的Dirichlet分配（LDA）是用于分析大文本数据的最受欢迎的主题模型之一。当反复应用于同一文本语料库时，LDA会导致不同的结果。为解决这个问题，已经提出了几种方法。在本文中，而不是处理这种方法的算法不确定性来源，我们评估了文本生成过程本身的梯度不确定性。对于此任务，我们使用直接LDA模型方法来量化由于文本生成的随机过程，并提出了三种不同的引导方法来重新制定文本。这些允许构建单个文本的主题比例的不确定性间隔以及随时间的文本语料库。我们讨论了从三个引导方法和单一文本的直接方法和文本聚合的差异。我们介绍了拟议方法的应用程序，以举个例子组成的德国日常素质报纸中的所有已发表的文章，并调查不同样本大小对不确定性间隔的影响。

著录项

来源
《European Conference on Machine Learning;European Conference on Principles and Practice of Knowledge Discovery in Databases》|2020年|xv 607p|共12页
会议地点
作者
Jonas Rieger; Carsten Jentsch; Jorg Rahnenfiihrer;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TF3-532;
关键词
Aleatoric uncertainty; Topic model; Machine learning Stochastic; Text data;

机译：炼米不确定性;主题模型;机器学习随机;文本数据;

相似文献

外文文献
中文文献
专利

1. Retrieving and Processing Images from the Pages of a Historical Newspaper and Modeling the Text Topics [J] . Gildacio J. de A. Sa, Jose E. B. Maia Journal of digital information management . 2021,第2期

机译：从历史报纸的页面中检索和处理图像并建立文本主题
2. Overcoming Language Barriers: Assessing the Potential of Machine Translation and Topic Modeling for the Comparative Analysis of Multilingual Text Corpora [J] . Reber Ueli Communication Methods and Measures . 2019,第2期

机译：克服语言障碍：评估机器翻译和主题建模的潜力，以了解多语言文本语料库的比较分析
3. Stochastic Variational Inference-Based Parallel and Online Supervised Topic Model for Large-Scale Text Processing [J] . Yang Li, Wen-Zhuo Song, Bo Yang 计算机科学技术学报（英文版） . 2018,第005期

机译：基于随机变分推理的大规模文本并行和在线监督主题模型
4. Assessing the Uncertainty of the Text Generating Process Using Topic Models [C] . Jonas Rieger, Carsten Jentsch, Jorg Rahnenfiihrer European Conference on Machine Learning;European Conference on Principles and Practice of Knowledge Discovery in Databases . 2020

机译：使用主题模型评估文本生成过程的不确定性
5. Assessing positional and modelling uncertainties in vector-based spatial processes and analyses in geographical information systems. [D] . Cheung, Tracy Chui-Kwan. 2003

机译：在基于向量的空间过程中评估位置和建模不确定性，并在地理信息系统中进行分析。
6. Aspiring to Unintended Consequences of Natural Language Processing: A Review of Recent Developments in Clinical and Consumer-Generated Text Processing [O] . D. Demner-Fushman, N. Elhadad 2016

机译：渴望自然语言处理的意外后果：临床和消费者生成的文本处理的最新进展回顾
7. Uncertainties in assessing the effect of climate change on agriculture using model simulation and uncertainty processing methods [O] . FengMei Yao, PengCheng Qin, JiaHua Zhang, 2011

机译：使用模型模拟和不确定性处理方法评估气候变化对农业的影响的不确定性

Assessing the Uncertainty of the Text Generating Process Using Topic Models

摘要

著录项

相似文献

相关主题

期刊订阅