首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >Semantics-Preserving Parallelization of Stochastic Gradient Descent
【24h】

Semantics-Preserving Parallelization of Stochastic Gradient Descent

机译:随机梯度下降的保留语义的并行化

获取原文

摘要

Stochastic gradient descent (SGD) is a well-known method for regression and classification tasks. However, it is an inherently sequential algorithm - at each step, the processing of the current example depends on the parameters learned from previous examples. Prior approaches to parallelizing linear learners using SGD, such as Hogwild! and AllReduce, do not honor these dependencies across threads and thus can potentially suffer poor convergence rates and/or poor scalability. This paper proposes SymSGD, a parallel SGD algorithm that, to a first-order approximation, retains the sequential semantics of SGD. Each thread learns a local model in addition to a model combiner, which allows local models to be combined to produce the same result as what a sequential SGD would have produced. This paper evaluates SymSGD's accuracy and performance on 6 datasets on a shared-memory machine shows up-to 11x speedup over our heavily optimized sequential baseline on 16 cores and 2.2x, on average, faster than Hogwild!.
机译:随机梯度下降(SGD)是用于回归和分类任务的众所周知的方法。但是,它是一种固有的顺序算法-在每个步骤中,当前示例的处理都取决于从先前示例中学到的参数。使用SGD并行化线性学习器的现有方法,例如Hogwild!和AllReduce,不遵守线程间的这些依赖关系,因此可能会导致收敛速度较差和/或可伸缩性较差。本文提出了SymSGD,这是一种并行SGD算法,对于一阶近似值,它保留了SGD的顺序语义。除了模型组合器之外,每个线程还学习局部模型,该模型组合器允许组合局部模型以产生与顺序SGD产生的结果相同的结果。本文对共享内存计算机上6个数据集的SymSGD的准确性和性能进行了评估,结果表明,与我们在16个内核上进行了高度优化的顺序基准相比,SymSGD的速度提高了11倍,平均速度比Hogwild!快了2.2倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号