首页> 外文学位 >Divide and Conquer Algorithms for Machine Learning
【24h】

Divide and Conquer Algorithms for Machine Learning

机译:机器学习的分而治之算法

获取原文
获取原文并翻译 | 示例

摘要

This thesis improves the scalability of machine learning by studying mergeable learning algorithms. In a mergeable algorithm, many processors independently solve the learning problem on small subsets of the data. Then a master processor merges the solutions together with only a single round of communication. Mergeable algorithms are popular because they are fast, easy to implement, and have strong privacy guarantees.;Our first contribution is a novel fast cross validation procedure suitable for any mergeable algorithm. This fast cross validation procedure has a constant runtime independent of the number of folds and can be implemented on distributed systems. This procedure is also widely applicable. We show that 32 recently proposed learning algorithms are mergeable and therefore fit our cross validation framework. These learning algorithms come from many subfields of machine learning, including density estimation, regularized loss minimization, dimensionality reduction, submodular optimization, variational inference, and markov chain monte carlo.;We also provide two new mergeable learning algorithms. In the context of regularized loss minimization, existing merge procedures either have high bias or slow runtimes. We introduce the optimal weighted average (OWA) merge procedure, which achieves both a low bias and fast runtime. We also improve the cover tree data structure for fast nearest neighbor queries by providing a merge procedure. In doing so, we improve both the theoretical guarantees of the cover tree and its practical runtime. For example, the original cover tree was able to find nearest neighbors in time O( cexp12 log n), and we improve this bound to O(chole 4 log n) for i.i.d. data. Here, c exp and chole are measures of the "intrinsic dimensionality'' of the data, and on typical datasets c exp > chole. Experiments on large scale ad-click, genomics, and image classification tasks empirically validate these algorithms.
机译:本文通过研究可合并学习算法提高了机器学习的可扩展性。在可合并算法中,许多处理器独立地解决了数据小子集上的学习问题。然后,主处理器仅通过一次通信就将解决方案合并在一起。可合并算法之所以受欢迎,是因为它们快速,易于实现并且具有强大的隐私保证。我们的第一贡献是适用于任何可合并算法的新颖的快速交叉验证程序。此快速交叉验证过程具有与折叠次数无关的恒定运行时间,并且可以在分布式系统上实现。该过程也广泛适用。我们表明,最近提出的32种学习算法是可合并的,因此适合我们的交叉验证框架。这些学习算法来自机器学习的许多子领域,包括密度估计,正则损失最小化,降维,子模优化,变分推理和马尔可夫链蒙特卡洛。我们还提供了两种新的可合并学习算法。在有规律的损失最小化的情况下,现有的合并过程具有较高的偏差或运行时间较慢。我们介绍了最佳加权平均(OWA)合并过程,该过程可实现低偏差和快速运行时间。通过提供合并过程,我们还改进了用于快速最近邻居查询的覆盖树数据结构。这样,我们既改进了覆盖树的理论保证,又提高了其实际运行时间。例如,原始的覆盖树能够在时间O(cexp12 log n)中找到最近的邻居,并且我们将其与i(i.d。数据。在这里,c exp和chole是数据的“固有维数”的度量,在典型的数据集c exp> chole上,大规模点击,基因组和图像分类任务的实验从经验上验证了这些算法。

著录项

  • 作者

    Izbicki, Michael John.;

  • 作者单位

    University of California, Riverside.;

  • 授予单位 University of California, Riverside.;
  • 学科 Information science.;Artificial intelligence.
  • 学位 Ph.D.
  • 年度 2017
  • 页码 170 p.
  • 总页数 170
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号