Probabilistic Methods for Distributed Learning.

机译：分布式学习的概率方法。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Access to data at massive scale has proliferated recently. A significant machine learning challenge concerns development of methods that efficiently model and learn from data at this scale, while retaining analysis flexibility and sophistication. Many statistical learning problems are formulated in terms of regularized empirical risk minimization. To scale this method to big data that are becoming commonplace in various applications, it is desirable to efficiently extend empirical risk minimization to a large-scale setting. When the size of the data is too large to be stored on a single machine, or at least too large to keep in a single localized memory, one popular solution is to store and process the data in a distributed manner. Consequently, the focus of this dissertation is to study distributed learning algorithms for empirical risk minimization problems.;Toward this end we propose a series of probabilistic methods for divide-and-conquer distributed learning, with these methods accounting for an increasing set of challenges. The basic Maximum Entropy Mixture (MEM) method is first proposed, to model uncertainty caused by randomly partitioning the data across computing nodes. We then develop a hierarchical extension to MEM, termed hMEM, facilitating sharing of statistical strength among data blocks. Finally, to addresses small sample bias, we impose the constraint that the mean of inferred parameters is the same across all data blocks, yielding a hierarchical MEM with expectation constraint (termed hecMEM). Computations are performed with a generalized Expectation-Maximization algorithm. The hecMEM method achieves state-of-the-art results for distributed matrix completion and logistic regression at massive scale, with comparisons made to MEM, hMEM and several alternative approaches.

机译：最近，对数据的大规模访问激增。一项重大的机器学习挑战涉及在保持分析灵活性和复杂性的同时，以这种规模有效地对数据进行建模和学习的方法的开发。许多统计学习问题是根据规范化的经验风险最小化制定的。为了将该方法扩展到在各种应用中变得司空见惯的大数据，期望将经验风险最小化有效地扩展到大规模设置。当数据大小太大而无法存储在单个计算机上，或者至少太大而无法保存在单个本地内存中时，一种流行的解决方案是以分布式方式存储和处理数据。因此，本论文的重点是研究用于最小化经验风险最小化问题的分布式学习算法。为此，我们提出了一系列用于分治式分布式学习的概率方法，这些方法解决了越来越多的挑战。首先提出了基本的最大熵混合（MEM）方法，以对由于跨计算节点随机划分数据而引起的不确定性进行建模。然后，我们开发了对MEM的分层扩展，称为hMEM，以促进数据块之间统计强度的共享。最后，为了解决较小的样本偏差，我们施加了以下约束：所有数据块上的推断参数均值相同，从而产生具有期望约束的分层MEM（称为hecMEM）。用广义的Expectation-Maximization算法执行计算。通过与MEM，hMEM和几种替代方法进行比较，hecMEM方法可实现大规模分布式矩阵完成和逻辑回归的最新结果。

著录项

作者
Zhang, XianXing.;
展开▼
作者单位

Duke University.;

展开▼
授予单位 Duke University.;
学科 Engineering Electronics and Electrical.
学位 Ph.D.
年度 2014
页码 56 p.
总页数 56
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Probabilistic Hesitant Fuzzy Methods for Prioritizing Distributed Stream Processing Frameworks for IoT Applications [J] . Zhimin Lin, Chao Huang, Mingwei Lin Mathematical Problems in Engineering: Theory, Methods and Applications . 2021,第a期

机译：用于优先考虑IOT应用程序的分布式流处理框架的概率犹豫模糊方法
2. Comparative analysis of deterministic and probabilistic methods for the integration of distributed generation in power systems [J] . Juan Carlos Beltrán, Andrés Julián Aristizábal, Alejandra López, Energy Reports . 2020,第1期

机译：电力系统中分布式发电集成的确定性和概率方法的比较分析
3. Assessment of probabilistic distributed factors influencing renewable energy supply for hotels using Monte-Carlo methods [J] . Meschede Henning, Dunkelberg Heiko, Stoehr Fabian, Energy . 2017,第juna1期

机译：使用蒙特卡洛方法评估影响酒店可再生能源供应的概率分布因素
4. Machine learning. A method of approximation of discriminant functions and two methods of estimation of a posterior probabilities of classes in the problem of classification [C] . V. V. Zenkov 2017 Tenth International Conference Management of Large-Scale System Development . 2017

机译：机器学习。判别函数的逼近方法和分类问题中类的后验概率的两种估算方法
5. A Scalable Software Framework for Solving PDES on Distributed Octree Meshes Using Nite Element MethodsA scalable software framework for solving pdes on distributed octree meshes using finite element methods [D] . Lofquist, Alec Dale. 2018

机译：使用NITE元素MetableA可扩展软件框架在分布式Octree网格上求解PDE的可扩展软件框架，用于使用有限元方法在分布式Octree网格上求解PDES
6. Nicotine improves probabilistic reward learning in wildtype but not alpha7 nAChR null mutants yet alpha7 nAChR agonists do not improve probabilistic learning. [O] . Morgane Milienne-Petiot, Kerin K Higa, Andrea Grim, -1

机译：尼古丁可改善野生型的概率奖励学习但不能改善alpha7 nAChR无效突变体但是alpha7 nAChR激动剂不能改善概率学习。
7. Probabilistic Hesitant Fuzzy Methods for Prioritizing Distributed Stream Processing Frameworks for IoT Applications [O] . Zhimin Lin, Chao Huang, Mingwei Lin 2021

机译：用于优先考虑IOT应用程序的分布式流处理框架的概率犹豫模糊方法

Probabilistic Methods for Distributed Learning.

摘要

著录项

相似文献

相关主题

期刊订阅