首页> 外文期刊>IEEE Transactions on Signal Processing >A Scalable Hierarchical Gaussian Process Classifier
【24h】

A Scalable Hierarchical Gaussian Process Classifier

机译:可扩展的分层高斯进程分类器

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Gaussian process (GP) models are powerful tools for Bayesian classification, but their limitation is the high computational cost. Existing approximation methods to reduce the cost of GP classification can be categorized into either global or local approaches. Global approximations, which summarize training data with inducing points, cannot account for non-stationarity and locality in complex datasets. Local approximations, which fit a GP for each sub-region of the input space, are prone to overfitting. This paper proposes a GP classification method that effectively utilizes both global and local information through a hierarchical model. The upper layer consists of a global sparse GP to coarsely model the entire dataset. The lower layer is composed of a mixture of GP experts, which use local information to learn a fine-grained model. The key idea to avoid overfitting and to enforce correlation among the experts is to incorporate global information into their shared prior mean function. A variational inference algorithm is developed for simultaneous learning of the global GP, the experts, and the gating network by maximizing a lower bound of the log marginal likelihood. We explicitly represent the variational distributions of the global variables so that the model conditioned on these variables factorizes in the observations. This way, stochastic optimization can be employed during learning to cater for large-scale problems. Experiments on a wide range of benchmark datasets demonstrate the advantages of the model, as a stand-alone classifier or as the top layer of a deep neural network, in terms of scalability and predictive power.
机译:高斯过程(GP)模型是贝叶斯分类的强大工具,但它们的限制是高计算成本。降低GP分类成本的现有近似方法可以分类为全局或本地方法。总结带有诱导点的培训数据的全局近似不能解释复杂数据集中的非实用性和局部性。适用于输入空间的每个子区域的GP的局部近似容易发生过度拟合。本文提出了一种GP分类方法,其通过分层模型有效地利用全局和本地信息。上层由全局稀疏GP组成,以粗略地模拟整个数据集。下层由GP专家的混合组成,该混合使用本地信息来学习细粒度模型。避免过度装备和实施专家之间的相关性的关键主意是将全球信息纳入其共享的先前平均函数。通过最大化日志边际可能性的下限来同时学习变形推断算法,用于同时学习全球GP,专家和门控网络。我们显式代表全局变量的变化分布,使模型在这些变量上调整在观察中。这样,在学习期间可以采用随机优化来迎合大规模问题。在可扩展性和预测功率方面,在各种基准数据集上展示了模型的优点,作为独立分类器或深神经网络的顶层。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号