首页> 外文期刊>Australian & New Zealand journal of statistics >ON THE COVERAGE PROBABILITY OF CONFIDENCE INTERVALS IN REGRESSION AFTER VARIABLE SELECTION
【24h】

ON THE COVERAGE PROBABILITY OF CONFIDENCE INTERVALS IN REGRESSION AFTER VARIABLE SELECTION

机译:变量选择后回归中置信区间的覆盖概率

获取原文
获取原文并翻译 | 示例
           

摘要

This paper considers a linear regression model with regression parameter vector β. The parameter of interest is θ = a~Tβ where a is specified. When, as a first step, a data-based variable selection (e.g. minimum Akaike information criterion) is used to select a model, it is common statistical practice to then carry out inference about θ, using the same data, based on the (false) assumption that the selected model had been provided a priori. The paper considers a confidence interval for θ with nominal coverage 1- α constructed on this (false) assumption, and calls this the naive 1 - α confidence interval. The minimum coverage probability of this confidence interval can be calculated for simple variable selection procedures involving only a single variable. However, the kinds of variable selection procedures used in practice are typically much more complicated. For the real-life data presented in this paper, there are 20 variables each of which is to be either included or not, leading to 2~(20) different models. The coverage probability at any given value of the parameters provides an upper bound on the minimum coverage probability of the naive confidence interval. This paper derives a new Monte Carlo simulation estimator of the coverage probability, which uses conditioning for variance reduction. For these real-life data, the gain in efficiency of this Monte Carlo simulation due to conditioning ranged from 2 to 6. The paper also presents a simple one-dimensional search strategy for parameter values at which the coverage probability is relatively small. For these real-life data, this search leads to parameter values for which the coverage probability of the naive 0.95 confidence interval is 0.79 for variable selection using the Akaike information criterion and 0.70 for variable selection using Bayes information criterion, showing that these confidence intervals are completely inadequate.
机译:本文考虑具有回归参数向量β的线性回归模型。感兴趣的参数是θ= a〜Tβ,其中指定了a。作为第一步,当使用基于数据的变量选择(例如,最小Akaike信息准则)选择模型时,通常的统计实践是基于(false ),前提是所选模型已获得先验。本文考虑了在此(假)假设下构建的标称覆盖范围为1-α的θ的置信区间,并将其称为朴素的1-α置信区间。对于仅涉及单个变量的简单变量选择过程,可以计算此置信区间的最小覆盖概率。然而,实践中使用的变量选择程序的种类通常要复杂得多。对于本文提供的真实数据,有20个变量或不包括每个变量,导致2〜(20)个不同的模型。参数的任何给定值的覆盖率概率提供了天真的置信区间的最小覆盖率上限。本文推导了一种新的覆盖概率的蒙特卡洛模拟估计器,该估计器使用条件来减少方差。对于这些现实数据,由于条件调整,此蒙特卡洛模拟的效率增益范围为2到6。本文还提出了一种简单的一维参数值搜索策略,覆盖率相对较小。对于这些现实数据,此搜索会得出参数值,对于这些参数值,使用Akaike信息准则进行变量选择时,朴素的0.95置信区间的覆盖概率为0.79;对于使用贝叶斯信息准则进行变量选择,则为0.70,这表明这些置信区间为完全不足。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号