首页> 外文会议>IEEE International Parallel and Distributed Processing Symposium >Efficient Gradient Boosted Decision Tree Training on GPUs
【24h】

Efficient Gradient Boosted Decision Tree Training on GPUs

机译:GPU上有效的渐变提升决策树训练

获取原文

摘要

In this paper, we present a novel parallel implementation for training Gradient Boosting Decision Trees (GBDTs) on Graphics Processing Units (GPUs). Thanks to the wide use of the open sourced XGBoost library, GBDTs have become very popular in recent years and won many awards in machine learning and data mining competitions. Although GPUs have demonstrated their success in accelerating many machine learning applications, there are a series of key challenges of developing a GPU-based GBDT algorithm, including irregular memory accesses, many small sorting operations and varying data parallel granularities in tree construction. To tackle these challenges on GPUs, we propose various novel techniques (including Run-length Encoding compression and thread/block workload dynamic allocation, and reusing intermediate training results for efficient gradient computation). Our experimental results show that our algorithm named GPU-GBDT is often 10 to 20 times faster than the sequential version of XGBoost, and achieves 1.5 to 2 times speedup over a 40 threaded XGBoost running on a relatively high-end workstation of 20 CPU cores. Moreover, GPU-GBDT outperforms its CPU counterpart by 2 to 3 times in terms of performance-price ratio.
机译:在本文中,我们提出了一种用于训练梯度升压决策树(GBDT)的新颖的并行实现,在图形处理单元(GPU)上。由于广泛使用开放的XGBoost图书馆,近年来GBDT已经变得非常受欢迎,并在机器学习和数据挖掘比赛中获得了许多奖项。虽然GPU已经证明了他们在加速许多机器学习应用方面的成功,但是开发基于GPU的GBDT算法的一系列关键挑战,包括不规则的存储器访问,许多小分类操作和树结构中的不同数据并行粒度。为了解决GPU上的这些挑战,我们提出了各种新颖的技术(包括运行长度编码压缩和线程/块工作负载动态分配,并重用中间训练结果以获得有效的梯度计算)。我们的实验结果表明,我们名为GPU-GBDT的算法通常比XGBoost的顺序版本快10到20倍,并在一个在20 CPU核心的相对高端工作站上运行的40个线程XGBoost加速1.5到2倍。此外,在性能 - 价格比方面,GPU-GBDT优于2至3次CPU对应。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号