Bagging GLM: Improved generalized linear model for the analysis of zero-inflated data

Osawa T.; Mitsuhashi H.; Uematsu Y.; Ushimaru A.

首页> 外文期刊>Ecological informatics: an international journal on ecoinformatics and computational ecology >Bagging GLM: Improved generalized linear model for the analysis of zero-inflated data

【24h】

Bagging GLM: Improved generalized linear model for the analysis of zero-inflated data

机译：Bagging GLM：改进的广义线性模型，用于零膨胀数据的分析

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Species-occurrence data sets tend to contain a large proportion of zero values, i.e., absence values (zero-inflated). Statistical inference using such data sets is likely to be inefficient or lead to incorrect conclusions unless the data are treated carefully. In this study, we propose a new modeling method to overcome the problems caused by zero-inflated data sets that involves a regression model and a machine-learning technique. We combined a generalized liner model (GLM), which is widely used in ecology, and bootstrap aggregation (bagging), a machine-learning technique. We established distribution models of Vincetoxicum pycnostelma (a vascular plant) and Ninox scutulata (an owl), both of which are endangered and have zero-inflated distribution patterns, using our new method and traditional GLM and compared model performances. At the same time we modeled four theoretical data sets that contained different ratios of presence/absence values using new and traditional methods and also compared model performances. For distribution models, our new method showed good performance compared to traditional GLMs. After bagging, area under the curve (AUC) values were almost the same as with traditional methods, but sensitivity values were higher. Additionally, our new method showed high sensitivity values compared to the traditional GLM when modeling a theoretical data set containing a large proportion of zero values. These results indicate that our new method has high predictive ability with presence data when analyzing zero-inflated data sets. Generally, predicting presence data is more difficult than predicting absence data. Our new modeling method has potential for advancing species distribution modeling.

机译：物种出现数据集倾向于包含很大比例的零值，即缺失值（零膨胀）。除非仔细处理数据，否则使用此类数据集进行统计推断可能会导致效率低下或得出错误的结论。在这项研究中，我们提出了一种新的建模方法来克服由零膨胀数据集引起的问题，该方法涉及回归模型和机器学习技术。我们将广泛应用于生态学的广义线性模型（GLM）与机器学习技术自举聚合（bagging）相结合。我们使用新方法和传统的GLM方法建立了濒临灭绝的Vincetoxicum pycnostelma（一种维管植物）和Ninox scutulata（一种猫头鹰）的分布模型，并对它们的模型性能进行了比较。同时，我们使用新方法和传统方法对四个理论数据集进行建模，这些数据集包含不同的存在/缺失值比率，并且还比较了模型性能。对于分布模型，与传统的GLM相比，我们的新方法显示出良好的性能。套袋后，曲线下面积（AUC）值与传统方法几乎相同，但灵敏度值更高。此外，在对包含大量零值的理论数据集进行建模时，与传统的GLM相比，我们的新方法显示出较高的灵敏度值。这些结果表明，在分析零膨胀数据集时，我们的新方法对存在数据具有较高的预测能力。通常，预测存在数据比预测缺失数据更加困难。我们的新建模方法具有推进物种分布建模的潜力。

著录项

来源
《Ecological informatics: an international journal on ecoinformatics and computational ecology》 |2011年第5期|共6页
作者
Osawa T.; Mitsuhashi H.; Uematsu Y.; Ushimaru A.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类普通生物学;
关键词
Bootstrapping; Data mining; Machine leaning; Ninox scutulata; Regression model; Species distribution model; Vincetoxicum pycnostelma;

机译：自举;数据挖掘;机器学习;盾形夜蛾;回归模型;物种分布模型;Vincetoxicum pycnostelma;

相似文献

外文文献
中文文献
专利

1. Bagging GLM: Improved generalized linear model for the analysis of zero-inflated data [J] . Osawa T., Mitsuhashi H., Uematsu Y., Ecological informatics: an international journal on ecoinformatics and computational ecology . 2011,第5期

机译：Bagging GLM：改进的广义线性模型，用于零膨胀数据的分析
2. Generalized estimating equations: A pragmatic and flexible approach to the marginal fc >GLM/fc>GLM modelling of correlated data in the behavioural sciences [J] . Pekár Stano, Brabec Marek, Bshary R. Ethology . 2018,第2期

机译：广义估计方程：边际＆ / fc> GLM在行为科学中的相关数据的务实和灵活的方法
3. Generalized partially linear single-index model for zero-inflated count data [J] . Wang Xiaoguang, Zhang Jun, Yu Liang, Statistics in medicine . 2015,第Pta5期

机译：零膨胀计数数据的广义部分线性单指标模型
4. Generalized Linear Models (GLMs) Approach in Modeling Rainfall Data over Johor Area [C] . Suhaimi Hanisah, Jamaludin Suhaila National Symposium on Mathematical Sciences . 2014

机译：柔佛州降雨数据建模的广义线性模型（GLMS）方法
5. Generalized mixed models with mixture links for multivariate zero-inflated count data. [D] . Wang, Lijuan. 2008

机译：带有混合链接的广义混合模型，用于多元零膨胀计数数据。
6. Zero-inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis [O] . Zheng-Zheng Tang, Guanhua Chen -1

机译：微生物组成数据分析的零充气的广义Dirichlet多项式回归模型
7. Generalized linear models (GLMS) approach in modelling rainfall data over Johor and Kelantan area [O] . Suhaimi Nor Hanisah, Syed Jamaludin Hariffah Suhaila 2014

机译：柔佛州和吉兰丹州降雨数据建模的广义线性模型（GLMS）方法

Bagging GLM: Improved generalized linear model for the analysis of zero-inflated data

摘要

著录项

相似文献

相关主题

期刊订阅