An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests

Dutang Christophe; Guibert Quentin

首页> 外文期刊>Statistics and computing >An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests

【24h】

An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests

机译：基于模型的树中的显式分割点过程，允许快速拟合 GLM 树和 GLM 森林

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相关主题

摘要

Classification and regression trees (CART) prove to be a true alternative to full parametric models such as linear models (LM) and generalized linear models (GLM). Although CART suffer from a biased variable selection issue, they are commonly applied to various topics and used for tree ensembles and random forests because of their simplicity and computation speed. Conditional inference trees and model-based trees algorithms for which variable selection is tackled via fluctuation tests are known to give more accurate and interpretable results than CART, but yield longer computation times. Using a closed-form maximum likelihood estimator for GLM, this paper proposes a split point procedure based on the explicit likelihood in order to save time when searching for the best split for a given splitting variable. A simulation study for non-Gaussian response is performed to assess the computational gain when building GLM trees. We also propose a benchmark on simulated and empirical datasets of GLM trees against CART, conditional inference trees and LM trees in order to identify situations where GLM trees are efficient. This approach is extended to multiway split trees and log-transformed distributions. Making GLM trees possible through a new split point procedure allows us to investigate the use of GLM in ensemble methods. We propose a numerical comparison of GLM forests against other random forest-type approaches. Our simulation analyses show cases where GLM forests are good challengers to random forests.

机译：分类和回归树（CART）被证明是线性模型（LM）和广义线性模型（GLM）等完整参数模型的真正替代方案。尽管 CART 存在变量选择偏差问题，但由于其简单性和计算速度，它们通常应用于各种主题并用于树集成和随机森林。众所周知，条件推理树和基于模型的树算法通过波动测试处理变量选择，其结果比 CART 更准确、更可解释，但计算时间更长。本文使用GLM的闭式最大似然估计器，提出了一种基于显式似然的分割点过程，以节省在为给定分割变量搜索最佳分割的时间。对非高斯响应进行了仿真研究，以评估构建GLM树时的计算增益。我们还提出了 GLM 树与 CART、条件推理树和 LM 树的模拟和经验数据集的基准，以确定 GLM 树有效的情况。这种方法已扩展到多路拆分树和对数转换分布。通过新的分割点过程使 GLM 树成为可能，使我们能够研究 GLM 在集成方法中的使用。我们提出了GLM森林与其他随机森林类型方法的数值比较。我们的模拟分析显示了 GLM 森林是随机森林的良好挑战者的情况。

著录项

来源
《Statistics and computing》 |2022年第1期|6.1-6.22|共22页
作者
Dutang Christophe; Guibert Quentin;
展开▼
作者单位

Univ PSL, Univ Paris Dauphine, CNRS, CEREMADE, Pl Ml Lattre Tassigny, F-75016 Paris, France;

Univ PSL, Univ Paris Dauphine, CNRS, CEREMADE, Pl Ml Lattre Tassigny, F-75016 Paris, France|Prim Act, 42 Av Grande Armee, F-75017 Paris, France;

展开▼
收录信息
原文格式 PDF
正文语种英语
中图分类
关键词
GLM; Model-based recursive partitioning; GLM trees; Random forest; GLM forest;

机译：GLM;基于模型的递归分区;GLM 树;随机森林;GLM 林;

An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests

摘要

著录项

引文网络

相关主题

期刊订阅