首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >Semantics-Preserving Parallelization of Stochastic Gradient Descent
【24h】

Semantics-Preserving Parallelization of Stochastic Gradient Descent

机译:随机梯度下降的语义保留并行化

获取原文

摘要

Stochastic gradient descent (SGD) is a well-known method for regression and classification tasks. However, it is an inherently sequential algorithm - at each step, the processing of the current example depends on the parameters learned from previous examples. Prior approaches to parallelizing linear learners using SGD, such as Hogwild! and AllReduce, do not honor these dependencies across threads and thus can potentially suffer poor convergence rates and/or poor scalability. This paper proposes SymSGD, a parallel SGD algorithm that, to a first-order approximation, retains the sequential semantics of SGD. Each thread learns a local model in addition to a model combiner, which allows local models to be combined to produce the same result as what a sequential SGD would have produced. This paper evaluates SymSGD's accuracy and performance on 6 datasets on a shared-memory machine shows up-to 11x speedup over our heavily optimized sequential baseline on 16 cores and 2.2x, on average, faster than Hogwild!.
机译:随机梯度下降(SGD)是用于回归和分类任务的公知方法。然而,它是固有的顺序算法 - 在每个步骤中,当前示例的处理取决于从前面示例中学到的参数。使用SGD并行化线性学习者的方法,例如Hogwild!并重新征求,不要遵循跨线的这些依赖性,因此可能遭受差的收敛率和/或可扩展性差。本文提出了一个Symsgd,一个并行SGD算法,指向一阶近似,保留了SGD的连续语义。每个线程除了模型组合器之外,每个线程还学习本地模型,这允许将本地模型组合以产生相同的结果,因为所产生的顺序SGD是什么。本文评估了Symsgd在共享内存机器上的6个数据集上的准确性和性能显示,在16个核心和2.2倍上,我们的大量优化的顺序基线高达11倍,平均而言,比Hogwild更快!

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号