...
首页> 外文期刊>Natural hazards and earth system sciences >Sample size matters: investigating the effect of sample size on a logistic regression susceptibility model for debris flows
【24h】

Sample size matters: investigating the effect of sample size on a logistic regression susceptibility model for debris flows

机译:样本量很重要:调查样本量对泥石流的逻辑回归敏感性模型的影响

获取原文
获取原文并翻译 | 示例
           

摘要

Predictive spatial modelling is an important task in natural hazard assessment and regionalisation of geomorphic processes or landforms. Logistic regression is a multivariate statistical approach frequently used in predictive modelling; it can be conducted stepwise in order to select from a number of candidate independent variables those that lead to the best model. In our case study on a debris flow susceptibility model, we investigate the sensitivity of model selection and quality to different sample sizes in light of the following problem: on the one hand, a sample has to be large enough to cover the variability of geofactors within the study area, and to yield stable and reproducible results; on the other hand, the sample must not be too large, because a large sample is likely to violate the assumption of independent observations due to spatial autocorrelation. Using stepwise model selection with 1000 random samples for a number of sample sizes between n = 50 and n = 5000, we investigate the inclusion and exclusion of geofactors and the diversity of the resulting models as a function of sample size; the multiplicity of different models is assessed using numerical indices borrowed from information theory and biodiversity research. Model diversity decreases with increasing sample size and reaches either a local minimum or a plateau; even larger sample sizes do not further reduce it, and they approach the upper limit of sample size given, in this study, by the autocorrelation range of the spatial data sets. In this way, an optimised sample size can be derived from an exploratory analysis. Model uncertainty due to sampling and model selection, and its predictive ability, are explored statistically and spatially through the example of 100 models estimated in one study area and validated in a neighbouring area: depending on the study area and on sample size, the predicted probabilities for debris flow release differed, on average, by 7 to 23 percentage points. In view of these results, we argue that researchers applying model selection should explore the behaviour of the model selection for different sample sizes, and that consensus models created from a number of random samples should be given preference over models relying on a single sample.
机译:在自然灾害评估和地貌过程或地貌区域化中,预测性空间建模是一项重要任务。 Logistic回归是预测模型中经常使用的多元统计方法;可以逐步进行,以便从多个候选独立变量中选择那些导致最佳模型的变量。在我们对泥石流敏感性模型的案例研究中,鉴于以下问题,我们研究了模型选择和质量对不同样本量的敏感性:一方面,样本必须足够大,以覆盖其中的地质因素的可变性。研究区域,并产生稳定且可重复的结果;另一方面,样本一定不能太大,因为由于空间自相关,大样本可能会违反独立观察的假设。通过对n = 50到n = 5000之间的多个样本量使用1000个随机样本进行逐步模型选择,我们研究了地理因素的包含和排除以及所得模型的多样性与样本量的关系;使用从信息论和生物多样性研究中借来的数字指标来评估不同模型的多样性。模型多样性随着样本数量的增加而减小,并达到局部最小值或平稳状态。甚至更大的样本量也无法进一步减小样本量,在本研究中,它们接近空间数据集的自相关范围给出的样本量上限。这样,可以从探索性分析中得出优化的样本量。通过在一个研究区域中估计并在相邻区域中进行验证的100个模型的示例,从统计学和空间上探讨了由于抽样和模型选择而引起的模型不确定性及其预测能力:取决于研究区域和样本量,预测概率泥石流释放的平均差异为7至23个百分点。鉴于这些结果,我们认为应用模型选择的研究人员应探索针对不同样本量的模型选择的行为,并且应优先考虑从多个随机样本创建的共识模型,而不是依赖于单个样本的模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号