首页> 外文期刊>Applied Mathematical Modelling >Selecting appropriate machine learning methods for digital soil mapping
【24h】

Selecting appropriate machine learning methods for digital soil mapping

机译:选择适当的机器学习方法进行数字土壤制图

获取原文
获取原文并翻译 | 示例
       

摘要

Digital soil mapping (DSM) increasingly makes use of machine learning algorithms to identify relationships between soil properties and multiple covariates that can be detected across landscapes. Selecting the appropriate algorithm for model building is critical for optimizing results in the context of the available data. Over the past decade, many studies have tested different machine learning (ML) approaches on a variety of soil data sets. Here, we review the application of some of the most popular ML algorithms for digital soil mapping. Specifically, we compare the strengths and weaknesses of multiple linear regression (MLR), k-nearest neighbors (KNN), support vector regression (SVR), Cubist, random forest (RF), and artificial neural networks (ANN) for DSM. These algorithms were compared on the basis of five factors: (1) quantity of hyperparameters, (2) sample size, (3) covariate selection, (4) learning time, and (5) interpretability of the resulting model. If training time is a limitation, then algorithms that have fewer model parameters and hyperparameters should be considered, e.g., MLR, KNN, SVR, and Cubist. If the data set is large (thousands of samples) and computation time is not an issue, ANN would likely produce the best results. If the data set is small (<100), then Cubist, KNN, RF, and SVR are likely to perform better than ANN and MLR. The uncertainty in predictions produced by Cubist, KNN, RF, and SVR may not decrease with large datasets. When interpretability of the resulting model is important to the user, Cubist, MLR, and RF are more appropriate algorithms as they do not function as "black boxes." There is no one correct approach to produce models for predicting the spatial distribution of soil properties. Nonetheless, some algorithms are more appropriate than others considering the nature of the data and purpose of mapping activity.
机译:数字土壤制图(DSM)越来越多地使用机器学习算法来识别土壤特性与可以在整个景观中检测到的多个协变量之间的关系。选择合适的模型构建算法对于在可用数据的上下文中优化结果至关重要。在过去的十年中,许多研究已经在各种土壤数据集上测试了不同的机器学习(ML)方法。在这里,我们回顾了一些最流行的ML算法在数字土壤制图中的应用。具体来说,我们比较了DSM的多元线性回归(MLR),k近邻(KNN),支持向量回归(SVR),立体派,随机森林(RF)和人工神经网络(ANN)的优缺点。在五个因素的基础上对这些算法进行了比较:(1)超参数的数量,(2)样本大小,(3)协变量选择,(4)学习时间以及(5)结果模型的可解释性。如果训练时间是一个限制,则应考虑使用具有较少模型参数和超参数的算法,例如MLR,KNN,SVR和Cubist。如果数据集很大(数千个样本)并且计算时间不是问题,则ANN可能会产生最佳结果。如果数据集较小(<100),则Cubist,KNN,RF和SVR的性能可能会优于ANN和MLR。对于大型数据集,Cubist,KNN,RF和SVR产生的预测不确定性可能不会降低。当生成的模型的可解释性对用户很重要时,Cubist,MLR和RF是更合适的算法,因为它们不充当“黑匣子”。没有一种正确的方法可以产生用于预测土壤特性空间分布的模型。尽管如此,考虑到数据的性质和映射活动的目的,某些算法比其他算法更合适。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号