首页> 外文学位 >Extracting information from high-dimensional data: Probabilistic modeling, inference and evaluation.
【24h】

Extracting information from high-dimensional data: Probabilistic modeling, inference and evaluation.

机译:从高维数据中提取信息:概率建模,推断和评估。

获取原文
获取原文并翻译 | 示例

摘要

In this thesis, we shall derive, in a variety of settings, and for different applications, efficient posterior inference algorithms handling large data sets, and use side information to derive superior inference techniques. We demonstrate the efficiency and accuracy of those models and algorithms in the different applications, on both real and synthetic data sets. We evaluate the quality of the results, with both quantitative and human evaluation experiments.;In the first part of the thesis the general framework is that of sparsity: we assume the data have a sparse representation; the application on which we focus is image super-resolution, in which one seeks to "up-scale images", i.e. "reconstruct" finer detail in an image than given in the data. Image super-resolution has been tackled successfully via sparse coding but not, so far, by Bayesian nonparametric methods (BNM). In other contexts, BNMs were shown to be powerful because they infer parameters that otherwise have to be assigned a priori. We build here the tools enabling such a BNM for the super-resolution of images. We start with building a sparse nonparametric factor analysis model for image super-resolution, more precisely, a model with a beta-Bernoulli process to learn the number of dictionary elements from the data. We test the results on both benchmark and natural images, comparing with the models in the literature. Then, we perform large-scale human evaluation experiments to explicitly assess the visual quality of the results. In a first implementation, we use Gibbs sampling, operating on the data in batch mode, and assess its performance. However, for large-scale data, such a Gibbs sampling approach is typically not feasible. To circumvent this, we develop an online variational Bayes (VB) algorithm that can deal with larger-scale data in a fraction of the time needed by traditional inference.;In the second part of the thesis we consider data sets with rich side information. We study 2 different frameworks that have such side information: relational information and group information. To handle relational information, we build a relational factor analysis (rFA) model which incorporates this into the dictionary learning. We show that the use of relational information (e.g. spatial location), helps learning higher quality dictionaries and improves the recommendation systems in a social network and the image analysis algorithms (e.g. image inpainting). To handle group information, we propose a multi-task learning framework for image super-resolution problem using a hierarchical beta-process as a prior to dictionary assignments. In this model, we study grouped data and we build a model incorporating the group information. We show that by incorporating group information in this way the algorithm avoids erroneous selection of dictionary elements.;Finally, in the third part of the thesis, we study latent sequential information between observations. We use this information to build a novel dynamic programming algorithm for sequential models. Hidden Markov models (HMMs) and conditional random fields (CRFs) are two popular techniques for modeling sequential data. Inference algorithms designed over CRFs and HMMs allow estimation of the state sequence, given the observations. In several applications, the end goal is not the estimation of the state sequence, but rather the estimation of the value of some function of the state sequence. In such scenarios, estimating the state sequence by conventional inference techniques, followed by computing the functional mapping from this estimate, is not necessarily optimal; it may be more efficient to directly infer the final outcome from the observations. In particular, we consider the specific instantiation of the problem where the goal is to find the state trajectories without exact transition points; we derive a novel polynomial time inference algorithm that outperforms vanilla inference techniques. We show that this particular problem arises commonly in many disparate applications and present the results for experiments on three different applications: (1) Toy robot tracking; (2) Single stroke character recognition; (3) Handwritten word recognition. (Abstract shortened by UMI.).
机译:在本文中,我们将在各种情况下针对不同的应用程序推导处理大型数据集的高效后验推理算法,并使用辅助信息推导出色的推理技术。我们在真实和综合数据集上展示了这些模型和算法在不同应用中的效率和准确性。我们通过定量和人工评估实验来评估结果的质量。在论文的第一部分中,总体框架是稀疏的:我们假设数据具有稀疏表示;我们关注的应用是图像超分辨率,其中人们寻求“放大图像”,即“重建”图像中比数据中给出的更精细的细节。图像超分辨率已通过稀疏编码成功解决,但到目前为止,还没有通过贝叶斯非参数方法(BNM)解决。在其他情况下,BNM被证明具有强大的功能,因为它们可以推断出必须先验分配的参数。我们在这里构建使此类BNM用于图像超分辨率的工具。我们首先建立一个用于图像超分辨率的稀疏非参数因子分析模型,更确切地说,是一个具有beta-Bernoulli过程的模型,以从数据中学习字典元素的数量。与文献中的模型相比,我们在基准图像和自然图像上测试了结果。然后,我们进行大规模的人类评估实验,以明确评估结果的视觉质量。在第一个实现中,我们使用Gibbs采样,以批处理模式对数据进行操作,并评估其性能。但是,对于大规模数据,这种吉布斯采样方法通常是不可行的。为了避免这种情况,我们开发了一种在线变分贝叶斯(VB)算法,该算法可以在传统推理所需的一小部分时间内处理较大规模的数据。在本文的第二部分,我们考虑了具有丰富辅助信息的数据集。我们研究了具有此类辅助信息的2个不同框架:关系信息和组信息。为了处理关系信息,我们建立了一个关系因素分析(rFA)模型,将该模型纳入字典学习中。我们表明,使用关系信息(例如空间位置)有助于学习更高质量的词典,并改善社交网络中的推荐系统和图像分析算法(例如图像修复)。为了处理组信息,我们提出了一种针对多图像的超分辨率问题的多任务学习框架,该框架使用分层的beta流程作为字典分配之前的任务。在此模型中,我们研究分组数据,并建立包含分组信息的模型。我们证明了通过以这种方式合并组信息可以避免字典元素的错误选择。最后,在论文的第三部分,我们研究了观测值之间的潜在顺序信息。我们使用此信息为顺序模型构建新颖的动态规划算法。隐马尔可夫模型(HMM)和条件随机字段(CRF)是两种用于对顺序数据进行建模的流行技术。在观察到的情况下,针对CRF和HMM设计的推理算法可估计状态序列。在一些应用中,最终目标不是状态序列的估计,而是状态序列某些函数的值的估计。在这种情况下,通过传统的推断技术估算状态序列,然后根据该估算值计算功能映射不一定是最佳的。直接从观察结果推断出最终结果可能会更有效。特别地,我们考虑问题的特定实例化,其中目标是找到没有确切过渡点的状态轨迹。我们得出了一种新颖的多项式时间推理算法,该算法优于香草推理技术。我们证明了这个特殊问题通常在许多不同的应用程序中出现,并为三种不同的应用程序提供了实验结果:(1)玩具机器人跟踪; (2)单笔划字符识别; (3)手写单词识别。 (摘要由UMI缩短。)。

著录项

  • 作者

    Polatkan, Gungor.;

  • 作者单位

    Princeton University.;

  • 授予单位 Princeton University.;
  • 学科 Statistics.;Engineering Electronics and Electrical.;Computer Science.
  • 学位 Ph.D.
  • 年度 2012
  • 页码 179 p.
  • 总页数 179
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号