首页> 外文期刊>Ecological informatics: an international journal on ecoinformatics and computational ecology >Predicting the future is hard and other lessons from a population time series data science competition
【24h】

Predicting the future is hard and other lessons from a population time series data science competition

机译:预测未来是艰难的人口时间序列数据科学竞赛的其他教训

获取原文
获取原文并翻译 | 示例
       

摘要

Population forecasting, in which past dynamics are used to make predictions of future state, has many real-world applications. While time series of animal abundance are often modeled in ways that aim to capture the underlying biological processes involved, doing so is neither necessary nor sufficient for making good predictions. Here we report on a data science competition focused on modelling time series of Antarctic penguin abundance. We describe the best performing submitted models and compare them to a Bayesian model previously developed by domain experts and build an ensemble model that outperforms the individual component models in prediction accuracy. The top performing models varied tremendously in model complexity, ranging from very simple forward extrapolations of average growth rate to ensembles of models integrating recently developed machine learning techniques. Despite the short time frame for the competition, four of the submitted models outperformed the model previously created by the team of domain experts. We discuss the structure of the best performing models and components therein that might be useful for other ecological applications, the benefit of creating ensembles of models for ecological prediction, and the costs and benefits of including detailed domain expertise in ecological modelling. Additionally, we discuss the benefits of data science competitions, among which are increased visibility for challenging science questions, the generation of new techniques not yet adopted within the ecological community, and the ability to generate ensemble model forecasts that directly address model uncertainty.
机译:人口预测,过去的动态用于预测未来状态,有许多现实世界的应用。虽然时间系列的动物丰富通常以旨在捕获所涉及的潜在生物过程的方式建模,但这既不是必要的也不足以做出良好的预测。在这里,我们报告了一个专注于南极企鹅丰富的时间序列的数据科学竞争。我们描述了最好的提交的模型,并将它们与以前由域专家开发的贝叶斯模型进行比较,并建立一个以预测准确性更优于各个组件模型的集合模型。顶部执行模型在模型复杂性中变化了极大的变化,从非常简单的前向外推到平均增长率到集成的模型集成的型号,最近开发的机器学习技术。尽管竞争短时间框架,但其中四种模型中的四种模型表现出以前由域专家团队创建的模型。我们讨论其中最佳性能的模型和组件的结构可能对其他生态应用可能有用,这是创建生态预测模型的集成的益处,以及在生态建模中包括详细域专业知识的成本和益处。此外,我们讨论了数据科学竞赛的好处,其中增加了挑战性科学问题的能见度,生态社区尚未采用的新技术的产生,以及生成集合模型的能力预测直接解决模型不确定性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号