首页> 外文期刊>Data mining and knowledge discovery >Discrete-time survival forests with Hellinger distance decision trees
【24h】

Discrete-time survival forests with Hellinger distance decision trees

机译:Hellinger距离决策树的离散时间生存林

获取原文
获取原文并翻译 | 示例
           

摘要

Random survival forests (RSF) are a powerful nonparametric method for building prediction models with a time-to-event outcome. RSF do not rely on the proportional hazards assumption and can be readily applied to both low- and higher-dimensional data. A remaining limitation of RSF, however, arises from the fact that the method is almost entirely focussed on continuously measured event times. This issue may become problematic in studies where time is measured on a discrete scale t=1,2,...documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$t = 1, 2, ...$$end{document}, referring to time intervals [0,a1),[a1,a2), horizontal ellipsis documentclass[12pt]{minimal} usepackage{amsmath} usepackage{wasysym} usepackage{amsfonts} usepackage{amssymb} usepackage{amsbsy} usepackage{mathrsfs} usepackage{upgreek} setlength{oddsidemargin}{-69pt} egin{document}$$[0,a_1), [a_1,a_2), ldots $$end{document}. In this situation, the application of methods designed for continuous time-to-event data may lead to biased estimators and inaccurate predictions if discreteness is ignored. To address this issue, we develop a RSF algorithm that is specifically designed for the analysis of (possibly right-censored) discrete event times. The algorithm is based on an ensemble of discrete-time survival trees that operate on transformed versions of the original time-to-event data using tree methods for binary classification. As the outcome variable in these trees is typically highly imbalanced, our algorithm implements a node splitting strategy based on Hellinger's distance, which is a skew-insensitive alternative to classical split criteria such as the Gini impurity. The new algorithm thus provides flexible nonparametric predictions of individual-specific discrete hazard and survival functions. Our numerical results suggest that node splitting by Hellinger's distance improves predictive performance when compared to the Gini impurity. Furthermore, discrete-time RSF improve prediction accuracy when compared to RSF approaches treating discrete event times as continuous in situations where the number of time intervals is small.
机译:随机生存森林(RSF)是一种强大的非参数方法,用于建立具有时间到事件结果的预测模型。 RSF不依赖于比例危险假设,并且可以容易地应用于低和高维度数据。然而,RSF的剩余限制是由于该方法几乎完全集中在不断测量的事件时间上的事实中。在离散量表T = 1,2,... documentClass [12pt] {minimal} usepackage {ammath} usepackage {isysym} usepackage {amsfonts} usepackage { amssymb} usepackage {amsbsy} usepackage {mathrsfs} usepackage {supmeez} setLength { oddsideDemargin} { - 69pt} begin {document} $$ t = 1,2,... $$ end {document} ,参考时间间隔[0,a1),[a1,a2),水平ellipsis documentclass [12pt] {minimal} usepackage {ammath} usepackage {isysym} usepackage {amsfonts} usepackage {amssymb} usepackage { Amssbace} usepackage {mathrsfs} usepackage {supmeek} setLength { oddsidemargin} { - 69pt} begin {document} $$ [0,a_1),[a_1,a_2), ldots $$ neg {document} 。在这种情况下,如果忽略离散性,则设计用于连续时间 - 事件数据的方法可能导致偏置估计器和不准确的预测。为了解决这个问题,我们开发了一个RSF算法,专门用于分析(可能是右审查)离散事件时间。该算法基于离散时间生存树的集合,其在使用树方法进行二进制分类的原始时间 - 事件数据的转换版本。随着这些树中的结果变量通常高度不平衡,我们的算法基于Hellinger的距离实现了节点分割策略,这是一种歪斜不区分的替代,诸如基尼杂质的经典分流标准。因此,新的算法提供了个性特异性离散危险和生存功能的灵活的非参数预测。我们的数值结果表明,与Gini杂质相比,Hellinger距离的节点拆分会提高预测性能。此外,与将离散事件时间的方法相比,离散时间RSF提高预测准确性在时间间隔的数量较小的情况下连续的离散事件时间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号