首页> 外文学位 >Probabilistic Methods for Distributed Learning.
【24h】

Probabilistic Methods for Distributed Learning.

机译:分布式学习的概率方法。

获取原文
获取原文并翻译 | 示例

摘要

Access to data at massive scale has proliferated recently. A significant machine learning challenge concerns development of methods that efficiently model and learn from data at this scale, while retaining analysis flexibility and sophistication. Many statistical learning problems are formulated in terms of regularized empirical risk minimization. To scale this method to big data that are becoming commonplace in various applications, it is desirable to efficiently extend empirical risk minimization to a large-scale setting. When the size of the data is too large to be stored on a single machine, or at least too large to keep in a single localized memory, one popular solution is to store and process the data in a distributed manner. Consequently, the focus of this dissertation is to study distributed learning algorithms for empirical risk minimization problems.;Toward this end we propose a series of probabilistic methods for divide-and-conquer distributed learning, with these methods accounting for an increasing set of challenges. The basic Maximum Entropy Mixture (MEM) method is first proposed, to model uncertainty caused by randomly partitioning the data across computing nodes. We then develop a hierarchical extension to MEM, termed hMEM, facilitating sharing of statistical strength among data blocks. Finally, to addresses small sample bias, we impose the constraint that the mean of inferred parameters is the same across all data blocks, yielding a hierarchical MEM with expectation constraint (termed hecMEM). Computations are performed with a generalized Expectation-Maximization algorithm. The hecMEM method achieves state-of-the-art results for distributed matrix completion and logistic regression at massive scale, with comparisons made to MEM, hMEM and several alternative approaches.
机译:最近,对数据的大规模访问激增。一项重大的机器学习挑战涉及在保持分析灵活性和复杂性的同时,以这种规模有效地对数据进行建模和学习的方法的开发。许多统计学习问题是根据规范化的经验风险最小化制定的。为了将该方法扩展到在各种应用中变得司空见惯的大数据,期望将经验风险最小化有效地扩展到大规模设置。当数据大小太大而无法存储在单个计算机上,或者至少太大而无法保存在单个本地内存中时,一种流行的解决方案是以分布式方式存储和处理数据。因此,本论文的重点是研究用于最小化经验风险最小化问题的分布式学习算法。为此,我们提出了一系列用于分治式分布式学习的概率方法,这些方法解决了越来越多的挑战。首先提出了基本的最大熵混合(MEM)方法,以对由于跨计算节点随机划分数据而引起的不确定性进行建模。然后,我们开发了对MEM的分层扩展,称为hMEM,以促进数据块之间统计强度的共享。最后,为了解决较小的样本偏差,我们施加了以下约束:所有数据块上的推断参数均值相同,从而产生具有期望约束的分层MEM(称为hecMEM)。用广义的Expectation-Maximization算法执行计算。通过与MEM,hMEM和几种替代方法进行比较,hecMEM方法可实现大规模分布式矩阵完成和逻辑回归的最新结果。

著录项

  • 作者

    Zhang, XianXing.;

  • 作者单位

    Duke University.;

  • 授予单位 Duke University.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2014
  • 页码 56 p.
  • 总页数 56
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号