首页> 外文期刊>Statistics and computing >An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests
【24h】

An explicit split point procedure in model-based trees allowing for a quick fitting of GLM trees and GLM forests

机译:基于模型的树中的显式分割点过程,允许快速拟合 GLM 树和 GLM 森林

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

Classification and regression trees (CART) prove to be a true alternative to full parametric models such as linear models (LM) and generalized linear models (GLM). Although CART suffer from a biased variable selection issue, they are commonly applied to various topics and used for tree ensembles and random forests because of their simplicity and computation speed. Conditional inference trees and model-based trees algorithms for which variable selection is tackled via fluctuation tests are known to give more accurate and interpretable results than CART, but yield longer computation times. Using a closed-form maximum likelihood estimator for GLM, this paper proposes a split point procedure based on the explicit likelihood in order to save time when searching for the best split for a given splitting variable. A simulation study for non-Gaussian response is performed to assess the computational gain when building GLM trees. We also propose a benchmark on simulated and empirical datasets of GLM trees against CART, conditional inference trees and LM trees in order to identify situations where GLM trees are efficient. This approach is extended to multiway split trees and log-transformed distributions. Making GLM trees possible through a new split point procedure allows us to investigate the use of GLM in ensemble methods. We propose a numerical comparison of GLM forests against other random forest-type approaches. Our simulation analyses show cases where GLM forests are good challengers to random forests.
机译:分类和回归树 (CART) 被证明是线性模型 (LM) 和广义线性模型 (GLM) 等完整参数模型的真正替代方案。尽管 CART 存在变量选择偏差问题,但由于其简单性和计算速度,它们通常应用于各种主题并用于树集成和随机森林。众所周知,条件推理树和基于模型的树算法通过波动测试处理变量选择,其结果比 CART 更准确、更可解释,但计算时间更长。本文使用GLM的闭式最大似然估计器,提出了一种基于显式似然的分割点过程,以节省在为给定分割变量搜索最佳分割的时间。对非高斯响应进行了仿真研究,以评估构建GLM树时的计算增益。我们还提出了 GLM 树与 CART、条件推理树和 LM 树的模拟和经验数据集的基准,以确定 GLM 树有效的情况。这种方法已扩展到多路拆分树和对数转换分布。通过新的分割点过程使 GLM 树成为可能,使我们能够研究 GLM 在集成方法中的使用。我们提出了GLM森林与其他随机森林类型方法的数值比较。我们的模拟分析显示了 GLM 森林是随机森林的良好挑战者的情况。

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号