首页> 外文期刊>Ecological informatics: an international journal on ecoinformatics and computational ecology >Bagging GLM: Improved generalized linear model for the analysis of zero-inflated data
【24h】

Bagging GLM: Improved generalized linear model for the analysis of zero-inflated data

机译:Bagging GLM:改进的广义线性模型,用于零膨胀数据的分析

获取原文
获取原文并翻译 | 示例
       

摘要

Species-occurrence data sets tend to contain a large proportion of zero values, i.e., absence values (zero-inflated). Statistical inference using such data sets is likely to be inefficient or lead to incorrect conclusions unless the data are treated carefully. In this study, we propose a new modeling method to overcome the problems caused by zero-inflated data sets that involves a regression model and a machine-learning technique. We combined a generalized liner model (GLM), which is widely used in ecology, and bootstrap aggregation (bagging), a machine-learning technique. We established distribution models of Vincetoxicum pycnostelma (a vascular plant) and Ninox scutulata (an owl), both of which are endangered and have zero-inflated distribution patterns, using our new method and traditional GLM and compared model performances. At the same time we modeled four theoretical data sets that contained different ratios of presence/absence values using new and traditional methods and also compared model performances. For distribution models, our new method showed good performance compared to traditional GLMs. After bagging, area under the curve (AUC) values were almost the same as with traditional methods, but sensitivity values were higher. Additionally, our new method showed high sensitivity values compared to the traditional GLM when modeling a theoretical data set containing a large proportion of zero values. These results indicate that our new method has high predictive ability with presence data when analyzing zero-inflated data sets. Generally, predicting presence data is more difficult than predicting absence data. Our new modeling method has potential for advancing species distribution modeling.
机译:物种出现数据集倾向于包含很大比例的零值,即缺失值(零膨胀)。除非仔细处理数据,否则使用此类数据集进行统计推断可能会导致效率低下或得出错误的结论。在这项研究中,我们提出了一种新的建模方法来克服由零膨胀数据集引起的问题,该方法涉及回归模型和机器学习技术。我们将广泛应用于生态学的广义线性模型(GLM)与机器学习技术自举聚合(bagging)相结合。我们使用新方法和传统的GLM方法建立了濒临灭绝的Vincetoxicum pycnostelma(一种维管植物)和Ninox scutulata(一种猫头鹰)的分布模型,并对它们的模型性能进行了比较。同时,我们使用新方法和传统方法对四个理论数据集进行建模,这些数据集包含不同的存在/缺失值比率,并且还比较了模型性能。对于分布模型,与传统的GLM相比,我们的新方法显示出良好的性能。套袋后,曲线下面积(AUC)值与传统方法几乎相同,但灵敏度值更高。此外,在对包含大量零值的理论数据集进行建模时,与传统的GLM相比,我们的新方法显示出较高的灵敏度值。这些结果表明,在分析零膨胀数据集时,我们的新方法对存在数据具有较高的预测能力。通常,预测存在数据比预测缺失数据更加困难。我们的新建模方法具有推进物种分布建模的潜力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号