首页> 外文OA文献 >Probabilistic modelling of symbolic data and blocking collapsed Gibbs samplers for topic models

【2h】

Probabilistic modelling of symbolic data and blocking collapsed Gibbs samplers for topic models

机译：符号数据的概率建模和阻止崩溃的Gibbs采样器进行主题模型

代理获取

本网站仅为用户提供外文OA文献查询和代理获取服务，本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文，但由于OA文献来源多样且变更频繁，仍可能出现获取不到、文献不完整或与标题不符等情况，如果获取不到我们将提供退款服务。请知悉。

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Symbolic data are distributions constructed from data points. When big datasets can be organised into different groups, one may first summarise each group by a symbol, and then analyse the symbolic dataset directly. By reducing the dataset to a more manageable size, it enables explanatory analysis and statistical inference, which would be impossible for the original large dataset. In the first half of this thesis, we develop a probabilistic approach for constructing likelihood functions for two types of symbolic data, interval-valued data and histogram-valued data. Existing methods ignore the process by which the symbolic data are constructed; namely by the aggregation of real-valued data generated from some underlying process. We develop the foundation of likelihood-based statistical inference for random symbols that directly incorporates the underlying generative procedure into the analysis. It permits the direct fitting of models for the underlying real-valued data given only symbolic summaries. Our approach overcomes several problems associated with existing methods, and can jointly model intra- and inter- symbol variations. The new methods are illustrated by simulated and real data analyses.Latent variable models are powerful tools to extract unobserved features from big datasets. Well-known examples are latent Dirichlet allocation (LDA) and hierarchical Dirichlet process mixtures (HDP-M) for topic modelling. Collapsed Gibbs samplers are routinely used for Bayesian inference for these models due to their superior performance in chain mixing. In the second half of the thesis, we propose a blocking scheme for a collapsed Gibbs sampler for the LDA and HDP-M models which can improve chain mixing efficiency. For the LDA model, we develop an O(log K)-step nested sampling (K is the number of topics) to directly simulate latent variables for each block. To obtain such blocking scheme for the HDP-M model, we introduce residual allocation processes (RAP), which can construct random partitions induced from Dirichlet processes in a class-wise manner, and propose hierarchical RAP for constructing random partitions induced from HDP. Derived from residual allocation constructions, the blocking scheme consists of nested sampling of the latent variables for existing topics and residual allocation sampling of the latent variables for new topics. We demonstrate that the blocking scheme achieves substantial improvements in chain mixing and a significant reduction in computation time.

机译：符号数据是从数据点构造的分布。当大型数据集可以组织为不同的组时，可以先用一个符号总结每个组，然后直接分析符号数据集。通过将数据集缩小到更易于管理的大小，它可以进行解释性分析和统计推断，而这对于原始的大型数据集是不可能的。在本文的上半部分，我们开发了一种概率方法，用于为两种类型的符号数据（区间值数据和直方图值数据）构造似然函数。现有方法忽略了构建符号数据的过程。也就是说，通过汇总一些基础流程生成的实值数据。我们为随机符号建立了基于似然性统计推断的基础，该符号直接将潜在的生成过程纳入分析中。仅给出符号摘要，就可以直接对基础实值数据进行模型拟合。我们的方法克服了与现有方法相关的几个问题，并且可以共同对符号内和符号间的变化进行建模。通过模拟和真实数据分析来说明新方法。潜在变量模型是从大型数据集中提取未观察到的特征的强大工具。著名的示例是潜在的Dirichlet分配（LDA）和用于主题建模的分层Dirichlet过程混合（HDP-M）。折叠的Gibbs采样器由于在链混合中的出色性能，通常用于这些模型的贝叶斯推断。在论文的后半部分，我们为LDA和HDP-M模型提出了一种折叠的Gibbs采样器的阻塞方案，可以提高链混合效率。对于LDA模型，我们开发了O（log K）步骤嵌套采样（K是主题数），以直接模拟每个块的潜在变量。为了获得针对HDP-M模型的这种阻塞方案，我们引入了剩余分配过程（RAP），它可以按类的方式构造由Dirichlet过程引起的随机分区，并提出分层RAP来构造由HDP-M模型引起的随机分区。从剩余分配构造派生而来，阻止方案包括对现有主题的潜在变量的嵌套采样和对新主题的潜在变量的残留分配采样。我们证明了阻塞方案在链混合中实现了实质性的改进，并显着减少了计算时间。

著录项

作者
Zhang Xin Mathematics Statistics Faculty of Science UNSW;
展开▼
作者单位

展开▼
年度 2017
总页数
原文格式 PDF
正文语种 en
中图分类

相似文献

外文文献
中文文献
专利

1. A blocked Gibbs sampler for NGG-mixture models via a priori truncation [J] . Argiento Raffaele, Bianchini Ilaria, Guglielmi Alessandra Statistics and computing . 2016,第3期

机译：通过先验截断得到的用于NGG混合物模型的封闭式Gibbs采样器
2. Partially collapsed parallel Gibbs sampler for Dirichlet process mixture models [J] . Yerebakan Halid Ziya, Dundar Murat Pattern recognition letters . 2017,第APRa15期

机译：用于Dirichlet过程混合物模型的部分折叠的平行Gibbs采样器
3. Geometric Ergodicity of Gibbs and Block Gibbs Samplers for a Hierarchical Random Effects Model [J] . James P. Hobert, Charles J. Geyer Journal of Multivariate Analysis: An International Journal . 1998,第2期

机译：分层随机效应模型的Gibbs和Block Gibbs采样器的几何遍历性
4. Model-based document clustering with a collapsed gibbs sampler [C] . Daniel David Walker, Eric K. Ringger ACM SIGKDD international conference on Knowledge discovery and data mining . 2008

机译：使用折叠的gibbs采样器的基于模型的文档聚类
5. Probabilistic Topic Modeling and Classification Probabilistic PCA for Text Corpora. [D] . Cheng, Chi Wa. 2011

机译：文本主题的概率主题建模和分类概率PCA。
6. A GPU-Based Gibbs Sampler for a Unidimensional IRT Model [O] . Yanyan Sheng, William S. Welling, Michelle M. Zhu 2014

机译：用于一维IRT模型的基于GPU的Gibbs采样器
7. A blocked Gibbs sampler for NGG-mixture models via a priori truncation. [O] . Argiento, Raffaele, Bianchini, Ilaria, Guglielmi, Alessandra 2015

机译：用于先验截断的NGG混合物模型的封闭Gibbs采样器。

Probabilistic modelling of symbolic data and blocking collapsed Gibbs samplers for topic models

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅