...
首页> 外文期刊>Ecological Modelling >Evaluating effectiveness of down-sampling for stratified designs and unbalanced prevalence in Random Forest models of tree species distributions in Nevada
【24h】

Evaluating effectiveness of down-sampling for stratified designs and unbalanced prevalence in Random Forest models of tree species distributions in Nevada

机译:在内华达州树种分布的随机森林模型中评估分层设计和不平衡流行率下采样的有效性

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Random Forests is frequently used to model species distributions over large geographic areas. Complications arise when data used to train the models have been collected in stratified designs that involve different sampling intensity per stratum. The modeling process is further complicated if some of the target species are relatively rare on the landscape leading to an unbalanced number of presences and absences in the training data. We explored means to accommodate unequal sampling intensity across strata as well as the unbalanced species prevalence in Random Forest models for tree and shrub species distributions in the state of Nevada. For the unequal sampling intensity issue, we tested three modeling strategies: fitting models using all the data, down-sampling the intensified stratum; and building separate models for each stratum. We explored unbalanced species prevalence by investigating the effects of down-sampling the more prevalent response (presence or absence), and by optimizing the cutoff thresholds for declaring a species present. When modeling species presence with stratified data that was collected with different sampling intensities per stratum, we found that neither down-sampling the intensified stratum, nor fitting individual strata models, improved model performance. We also found that balancing the number of presences and absences in a training data set by down-sampling did not improve predictive models of species distributions, and did not eliminate the need to optimize thresholds. We then apply our final choice of model to the full raster layers for Nevada to produce statewide species distribution maps.
机译:随机森林通常用于模拟大地理区域内的物种分布。当在分层设计中收集了用于训练模型的数据时,就会产生复杂性,其中分层设计涉及每个层的不同采样强度。如果某些目标物种在景观上相对稀少,导致训练数据中存在和不存在的数量不平衡,则建模过程将变得更加复杂。我们探索了适应内华达州树木和灌木物种分布的随机森林模型中各层采样强度不均以及物种不平衡的方法。对于不平等的抽样强度问题,我们测试了三种建模策略:使用所有数据拟合模型,对强化层进行下采样;并为每个阶层建立独立的模型。我们通过调查对更普遍的响应(存在或不存在)进行下采样的影响,以及通过优化用于声明存在物种的临界值阈值,来探索不平衡物种的流行。当使用分层数据对物种的存在进行建模时,分层数据是以每个层不同的采样强度收集的,我们发现对增强的层进行下采样或对单独的层模型进行拟合都不会改善模型性能。我们还发现,通过下采样来平衡训练数据集中存在和不存在的数量并不能改善物种分布的预测模型,也不能消除优化阈值的需要。然后,我们将模型的最终选择应用于内华达州的整个栅格图层,以生成全州范围的物种分布图。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号