【24h】

Accelerating a Random Forest Classifier: Multi-Core, GP-GPU, or FPGA?

机译:加速随机森林分类器:多核,GP-GPU还是FPGA?

获取原文
获取原文并翻译 | 示例

摘要

Random forest classification is a well known machine learning technique that generates classifiers in the form of an ensemble ("forest") of decision trees. The classification of an input sample is determined by the majority classification by the ensemble. Traditional random forest classifiers can be highly effective, but classification using a random forest is memory bound and not typically suitable for acceleration using FPGAs or GP-GPUs due to the need to traverse large, possibly irregular decision trees. Recent work at Lawrence Livermore National Laboratory has developed several variants of random forest classifiers, including the Compact Random Forest (CRF), that can generate decision trees more suitable for acceleration than traditional decision trees. Our paper compares and contrasts the effectiveness of FPGAs, GP-GPUs, and multi-core CPUs for accelerating classification using models generated by compact random forest machine learning classifiers. Taking advantage of training algorithms that can produce compact random forests composed of many, small trees rather than fewer, deep trees, we are able to regularize the forest such that the classification of any sample takes a deterministic amount of time. This optimization then allows us to execute the classifier in a pipelined or single-instruction multiple thread (SIMT) fashion. We show that FPGAs provide the highest performance solution, but require a multi-chip / multi-board system to execute even modest sized forests. GP-GPUs offer a more flexible solution with reasonably high performance that scales with forest size. Finally, multi-threading via Open MP on a shared memory system was the simplest solution and provided near linear performance that scaled with core count, but was still significantly slower than the GP-GPU and FPGA.
机译:随机森林分类是一种众所周知的机器学习技术,它以决策树的集合(“森林”)的形式生成分类器。输入样本的分类由集合的多数分类确定。传统的随机森林分类器可能非常有效,但是由于需要遍历可能不规则的大型决策树,因此使用随机森林进行的分类受内存限制,通常不适合使用FPGA或GP-GPU进行加速。劳伦斯·利弗莫尔国家实验室(Lawrence Livermore National Laboratory)的最新工作开发了几种随机森林分类器的变体,包括紧凑随机森林(CRF),与传统决策树相比,该紧凑树可以生成更适合加速的决策树。本文比较并对比了FPGA,GP-GPU和多核CPU使用紧凑型随机森林机器学习分类器生成的模型来加速分类的有效性。利用可以生成紧凑的随机森林的训练算法,该森林由许多小树而不是更少的深树组成,我们可以对森林进行正则化,以便任何样本的分类都需要确定的时间。然后,这种优化使我们能够以流水线或单指令多线程(SIMT)方式执行分类器。我们证明了FPGA提供了最高性能的解决方案,但需要多芯片/多板系统来执行适度大小的森林。 GP-GPU提供了一个更灵活的解决方案,具有合理的高性能,可以根据森林大小进行扩展。最终,在共享存储系统上通过Open MP进行多线程是最简单的解决方案,并提供了与内核数成比例的近乎线性的性能,但仍然比GP-GPU和FPGA慢得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号