Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation

Fer Istem; Kelly Ryan; Moorcroft Paul R.; Richardson Andrew D.; Cowdery Elizabeth M.; Dietze Michael C.

摘要

Data-model integration plays a critical role in assessing and improving our capacity to predict ecosystem dynamics. Similarly, the ability to attach quantitative statements of uncertainty around model forecasts is crucial for model assessment and interpretation and for setting field research priorities. Bayesian methods provide a rigorous data assimilation framework for these applications, especially for problems with multiple data constraints. However, the Markov chain Monte Carlo (MCMC) techniques underlying most Bayesian calibration can be prohibitive for computationally demanding models and large datasets. We employ an alternative method, Bayesian model emulation of sufficient statistics, that can approximate the full joint posterior density, is more amenable to parallelization, and provides an estimate of parameter sensitivity. Analysis involved informative priors constructed from a meta-analysis of the primary literature and specification of both model and data uncertainties, and it introduced novel approaches to autocorrelation corrections on multiple data streams and emulating the sufficient statistics surface. We report the integration of this method within an ecological workflow management software, Predictive Ecosystem Analyzer (PEcAn), and its application and validation with two process-based terrestrial ecosystem models: SIPNET and ED2. In a test against a synthetic dataset, the emulator was able to retrieve the true parameter values. A comparison of the emulator approach to standard brute-force MCMC involving multiple data constraints showed that the emulator method was able to constrain the faster and simpler SIPNET model's parameters with comparable performance to the brute-force approach but reduced computation time by more than 2 orders of magnitude. The emulator was then applied to calibration of the ED2 model, whose complexity precludes standard (brute-force) Bayesian data assimilation techniques. Both models are constrained after assimilation of the observational data with the emulator method, reducing the uncertainty around their predictions. Performance metrics showed increased agreement between model predictions and data. Our study furthers efforts toward reducing model uncertainties, showing that the emulator method makes it possible to efficiently calibrate complex models.

机译：数据模型集成在评估和提高我们预测生态系统动态的能力方面发挥着关键作用。同样，在模型预测周围附加不确定性的定量陈述的能力对于模型评估和解释以及确定现场研究优先事项至关重要。贝叶斯方法为这些应用提供了一个严格的数据同化框架，尤其是对于多个数据约束的问题。然而，大多数贝叶斯校准的Markov链蒙特卡罗（MCMC）技术可以对计算要求苛刻的模型和大型数据集来说是禁止的。我们采用另一种方法，贝叶斯模型仿真足够的统计数据，可以近似完全接头后密度，更易于平行化，并提供参数灵敏度的估计。分析涉及由META分析构建的信息前瞻性的主要文献和模型和数据不确定性的规范，并且它引入了多个数据流上的自相关校正的新方法并模拟了足够的统计表面。我们报告了在生态工作流管理软件，预测生态系统分析仪（PECAN）中的这种方法的集成，以及与基于过程的地面生态系统模型的应用和验证：Sipnet和ED2。在针对合成数据集的测试中，仿真器能够检索真正的参数值。仿真器方法对涉及多个数据约束的标准蛮力MCMC的比较表明，仿真器方法能够为Brute-Force方法的可比性进行速度和更简单的Sipnet模型的参数，但在2个订单中降低计算时间幅度。然后将仿真器应用于ED2模型的校准，其复杂性排除标准（蛮力）贝叶斯数据同化技术。在使用仿真器方法的观察数据的同化之后，这两种模型都受到约束，从而减少了他们预测周围的不确定性。性能指标显示模型预测和数据之间的协议增加。我们的研究传统努力减少模型不确定性，表明仿真器方法可以有效地校准复杂模型。

Linking big models to big data: efficient ecosystem model calibration through Bayesian model emulation

摘要

著录项

相关主题

期刊订阅