Parallel Training GBRT Based on KMeans Histogram Approximation for Big Data

机译：并行培训GBRT基于Kmeans直方图逼近的大数据

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Gradient Boosting Regression Tree (GBRT), one of the state-of-the-art ranking algorithms widely used in industry, faces challenges in the big data era. With the rapid increase in the sizes of datasets, the iterative training process of GBRT becomes very time-consuming over large scale data. In this paper, we aim to speed up the training process of each tree in the GBRT framework. First, we propose a novel KMeans histogram building algorithm which has lower time complexity and is more efficient than the cutting-edge histogram building method. Further, we put forward an approximation algorithm by combining the kernel density estimation with the histogram technique to improve the accuracy. We conduct a variety of experiments on both the public Learning To Rank (LTR) benchmark datasets and the large-scale real-world datasets from Baidu search engine. The experimental results show that our proposed parallel training algorithm outperforms the state-of-the-art parallel GBRT algorithm with near 2 times speedup and better accuracy. Also, our algorithm achieves the near-linear scalability.

机译：梯度提升回归树（GBRT），是在工业中广泛应用的最先进的排名算法之一，面临大数据时代的挑战。随着数据集大小的快速增加，GBRT的迭代训练过程变得非常耗时，大规模数据。在本文中，我们的目标是加快GBRT框架中每棵树的培训过程。首先，我们提出了一种新颖的浏览器直方图构建算法，其具有较低的时间复杂性，并且比尖端直方图构建方法更有效。此外，我们通过将内核密度估计与直方图技术相结合来提出近似算法来提高精度。我们对公共学习的各种实验进行排名（LTR）基准数据集和来自百度搜索引擎的大型现实世界数据集。实验结果表明，我们所提出的并行训练算法优于最先进的并行GBRT算法，近2倍加速和更好的准确性。此外，我们的算法达到了近线性可扩展性。

著录项

来源
《International Conference on Algorithms and Architectures for Parallel Processing》|2015年||共14页
会议地点
作者
Rong Gu; Lei Jin; Yongwei Wu; Jingying Qu; Tao Wang; Xiaojun Wang; Chunfeng Yuan; Yihua Huang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP301.6-53;
关键词
Learning To Rank; Gradient boosting regression tree; Parallel computing; KMeans histogram; Kernel density estimation;

机译：学习排名;渐变升压回归树;并行计算;kmeans直方图;核密度估计;

相似文献

外文文献
中文文献
专利

1. GPU parallel implementation of the new hybrid binarization based on Kmeans method (HBK) [J] . Soua Mahmoud, Kachouri Rostom, Akil Mohamed Journal of Real-Time Image Processing . 2018,第2期

机译：基于Kmeans方法（HBK）的新型混合二值化的GPU并行实现
2. A novel sub-Kmeans based on co-training approach by transforming single-view into multi-view [J] . Fengtao Nan, Yahui Tang, Po Yang, Future generation computer systems . 2021,第Deca期

机译：基于共同训练方法的新型次媒体，通过将单视图转换为多视图
3. Histogram based bounds and approximations for production lines [J] . Jean-Sébastien Tancrez, Pierre Semal, Philippe Chevalier European Journal of Operational Research . 2009,第3期

机译：基于直方图的生产线界限和近似值
4. Parallel Training GBRT Based on KMeans Histogram Approximation for Big Data [C] . Rong Gu, Lei Jin, Yongwei Wu, International conference on algorithms and architectures for parallel processing . 2015

机译：基于KMeans直方图逼近的大数据并行训练GBRT
5. Massively parallel engineering simulations on graphics processors: Parallelization, synchronization, and approximation [D] . Steuben, John C. 2014

机译：图形处理器上的大规模并行工程仿真：并行化，同步和逼近
6. Dataset for reservoir impoundment operation coupling parallel dynamic programming with importance sampling and successive approximation [O] . Shaokun He, Shenglian Guo, Kebing Chen, 2019

机译：水库蓄水作业数据集将并行动态规划与重要度采样和逐次逼近相结合
7. GBRT-Based Estimation of Terrestrial Latent Heat Flux in the Haihe River Basin from Satellite and Reanalysis Datasets [O] . Lu Wang, Yuhu Zhang, Yunjun Yao, 2021

机译：基于GBRT的卫星和再分析数据集海河流域地面潜热通量的估算

Parallel Training GBRT Based on KMeans Histogram Approximation for Big Data

摘要

著录项

相似文献

相关主题

期刊订阅