首页> 外文期刊>Integration >FPGA and GPU-based acceleration of ML workloads on Amazon cloud - A case study using gradient boosted decision tree library
【24h】

FPGA and GPU-based acceleration of ML workloads on Amazon cloud - A case study using gradient boosted decision tree library

机译:亚马逊云上基于FPGA和GPU的ML工作负载加速-使用梯度提升决策树库的案例研究

获取原文
获取原文并翻译 | 示例
       

摘要

Cloud vendors such as Amazon (AWS) have started to offer FPGAs in addition to GPUs and CPU in their computing on-demand services. In this work we explore design space trade-offs of implementing a state-of-the-art machine learning library for Gradient-boosted decision trees (GBDT) on Amazon cloud and compare the scalability, performance, cost and accuracy with best known CPU and GPU implementations from literature. Our evaluation indicates that depending on the dataset, an FPGA-based implementation of the bottleneck computation kernels yields a speed-up anywhere from 3X to 10X over a GPU and 5X to 33X over a CPU. We show that smaller bin size results in better performance on a FPGA, but even with a bin size of 16 and a fixed point implementation the degradation in terms of accuracy on a FPGA is relatively small, around 1.3s/0-3.3% compared to a floating point implementation with 256 bins on a CPU or GPU.
机译:诸如Amazon(AWS)之类的云供应商已开始在其按需计算服务中提供GPU和CPU以及FPGA。在这项工作中,我们探索了在Amazon Cloud上为梯度提升决策树(GBDT)实现最新机器学习库的设计空间取舍,并将可扩展性,性能,成本和准确性与最知名的CPU和来自文献的GPU实现。我们的评估表明,根据数据集,瓶颈计算内核的基于FPGA的实现可将GPU的速度提高3倍至10倍,将CPU的速度提高5倍至33倍。我们证明,较小的bin大小可在FPGA上带来更好的性能,但是即使在bin大小为16且采用定点实现的情况下,FPGA的准确性降低也相对较小,与之相比约为1.3s / 0-3.3%在CPU或GPU上具有256个bin的浮点实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号