【24h】

Logistic Methods for Resource Selection Functions and Presence-Only Species Distribution Models

机译:资源选择函数和仅存在物种分布模型的逻辑方法

获取原文

摘要

In order to better protect and conserve biodiversity, ecolo-gists use machine learning and statistics to understand how species respond to their environment and to predict how they will respond to future climate change, habitat loss and other threats. A fundamental modeling task is to estimate the probability that a given species is present in (or uses) a site, conditional on environmental variables such as precipitation and temperature. For a limited number of species, survey data consisting of both presence and absence records are available, and can be used to fit a variety of conventional classification and regression models. For most species, however, the available data consist only of occurrence records - locations where the species has been observed. In two closely-related but separate bodies of ecological literature, diverse special-purpose models have been developed that contrast occurrence data with a random sample of available environmental conditions. The most widespread statistical approaches involve either fitting an exponential model of species' conditional probability of presence, or fitting a naive logistic model in which the random sample of available conditions is treated as absence data; both approaches have well-known drawbacks, and do not necessarily produce valid probabilities. After summarizing existing methods, we overcome their drawbacks by introducing a new scaled binomial loss function for estimating an underlying logistic model of species presence/absence. Like the Expectation-Maximization approach of Ward et al. and the method of Steinberg and Cardell, our approach requires an estimate of population prevalence, Pr(y = 1), since prevalence is not identifiable from occurrence data alone. In contrast to the latter two methods, our loss function is straightforward to integrate into a variety of existing modeling frameworks such as generalized linear and additive models and boosted regression trees. We also demonstrate that approaches by Lele and Keim and by Lancaster and Imbens that surmount the identifiability issue by making parametric data assumptions do not typically produce valid probability estimates.
机译:为了更好地保护和保护生物多样性,生态仪式使用机器学习和统计数据来了解物种如何应对他们的环境,并预测如何应对未来的气候变化,栖息地丧失和其他威胁。基本建模任务是估计给定物种存在于(或使用)网站的概率,条件是环境变量,例如降水和温度。对于有限数量的物种,可以使用具有存在和缺席记录的调查数据,并且可用于符合各种传统分类和回归模型。然而,对于大多数物种,可用数据仅由观察到的物种的出现记录。在两个密切相关但具有独立的生态文学机构中,已经开发了多样的专用模型,其对比发生数据具有随机的可用环境条件的样本。最广泛的统计方法涉及拟合物种的指数模型的存在性概率,或者拟合幼稚的逻辑模型,其中可用条件的随机样品被视为缺席数据;两种方法都有众所周知的缺点,并且不一定产生有效的概率。在总结现有方法之后,我们通过引入新的缩放二项式损失函数来克服其估计物种存在/缺失的潜在物流模型的缺点。像沃德等人的期望 - 最大化方法。和斯坦伯格和心肺细胞的方法,我们的方法需要估计人口普遍性,Pr(y = 1),因为患病率不可识别单独的数据。与后两种方法相比,我们的损耗函数很简单地集成到各种现有的建模框架中,例如广义线性和添加模型以及增强回归树。我们还证明了LELE和KEIM以及兰开斯特和兰开斯特的方法,通过制作参数数据假设来超越可识别性问题的IMBens通常不会产生有效的概率估计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号