...
首页> 外文期刊>Fisheries Research >The predictive performances of random forest models with limited sample size and different species traits
【24h】

The predictive performances of random forest models with limited sample size and different species traits

机译:样品大小与不同物种特征的随机林模型的预测性能

获取原文
获取原文并翻译 | 示例

摘要

The random forest (RF) model is a powerful machine learning technique that has been increasingly used for species distribution modeling (SDM) by ecologists and fisheries scientists given various threats to marine habitats and biodiversity. However, the observations for model training are often constrained by limited surveys and financial resources. Under these circumstances, identifying the appropriate sample size for modeling is important for successful predictions. In addition, species with different biological characteristics present various challenges for SDM, which needs to be considered when evaluating model performance. We built and evaluated RF models for 21 marine demersal species using catch data and environmental variables collected during a bottom trawl survey in the coastal waters of Shandong Peninsula, China. The predictive performances of the RF models were evaluated for eight sample sizes using cross validation, in which a range of 10-80 sample sites were used to train the model. The resulting predictive performance was examined for a range of biological and behavioral traits. For most species, the predictive performance of the RF model was substantially improved when the sample size increased from 10 to 30 sites, but less improvement was evident with larger datasets. An ANOVA identified significant influences of migratory behavior, lifespan, body size, feeding mode and prevalence on the model predictability, whereas the effects of trophic level and taxon were insignificant, as were the interactions between the sample size and species traits. The abundance distributions could be better predicted for benthivores, and species with short migratory distances, short lifespans, and small body sizes, and for each species trait, the variation in the relative predictive performances of the trait categorical groups was generally consistent among sample sizes and performance metrics. Our study may contribute to an improved understanding of successful SDM and provide guidance for the application of RF models to predict the abundance distributions of fish species.
机译:随机森林(RF)模型是一种强大的机器学习技术,越来越多地用于生态学家和渔业科学家对海洋栖息地和生物多样性的各种威胁。但是,模型培训的观察通常受到有限调查和财务资源的限制。在这些情况下,识别适当的建模样本大小对于成功预测是重要的。此外,具有不同生物学特征的物种对于SDM具有各种挑战,在评估模型性能时需要考虑。我们使用山东半岛沿海水域沿海水域测量期间收集的捕获数据和环境变量建立和评估了21种海洋倒影物种的RF模型。使用交叉验证评估RF模型的预测性能,使用交叉验证进行八个样本尺寸,其中使用10-80个样本位点来训练模型。对一系列生物和行为特征进行了所得到的预测性能。对于大多数物种,当样品尺寸从10到30个位点增加时,RF模型的预测性能显着提高,但随着较大的数据集明显不太改善。 ANOVA鉴定了迁移行为,寿命,体积,喂养模式和患病率的显着影响,而营养水平和分类群的影响是微不足道的,样本大小与物种特征之间的相互作用是相互作用。可以更好地预测丰富的分布,并且具有短候距离,短暂的寿命和小体尺寸以及每个物种特征的物种,特征分类组的相对预测性能的变化通常在样本尺寸和性能指标。我们的研究可能有助于改善对成功SDM的理解,并为射频模型提供指导,以预测鱼类的丰富分布。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号