首页> 外文期刊>Journal of statistical computation and simulation >High dimensional variable selection with clustered data: an application of random multivariate survival forests for detection of outlier medical device components
【24h】

High dimensional variable selection with clustered data: an application of random multivariate survival forests for detection of outlier medical device components

机译:具有聚类数据的高尺寸变量选择:随机多变量存活林的应用,用于检测异常值医疗器械组件

获取原文
获取原文并翻译 | 示例

摘要

In many medical studies patients are nested or clustered within doctor. With many explanatory variables, variable selection with clustered data can be challenging. We propose a method for variable selection based on random forest that addresses clustered data through stratified binary splits. Our motivating example involves the detection orthopedic device components from a large pool of candidates, where each patient belongs to a surgeon. Simulations compare the performance of survival forests grown using the stratified logrank statistic to conventional and robust logrank statistics, as well as a method to select variables using a threshold value based on a variable's empirical null distribution. The stratified logrank test performs superior to conventional and robust methods when data are generated to have cluster-specific effects, and when cluster sizes are sufficiently large, perform comparably to the splitting alternatives in the absence of cluster-specific effects. Thresholding was effective at distinguishing between important and unimportant variables.
机译:在许多医学研究中,患者在医生内嵌套或聚集在一起。对于许多解释性变量,具有群集数据的变量选择可能是具有挑战性的。我们提出了一种基于随机林的可变选择方法,该林通过分层二进制分裂解决群集数据。我们的动机示例涉及从大型候选者的检测矫形器件组件,其中每个患者属于外科医生。仿真比较使用分层Logrank统计到常规和强大的Logrank统计的生存森林的性能,以及基于变量的经验空分布使用阈值选择变量的方法。当生成数据以具有群集特定效果时,分层Logrank测试优于传统和鲁棒的方法,并且当簇大小足够大时,在没有簇特定效果的情况下,拆分替代地执行。阈值化在区分重要性和不重要的变量方面是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号