首页> 外文期刊>Information Sciences: An International Journal >Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-art and the best conditions for the use of each model
【24h】

Evaluation of statistical and machine learning models for time series prediction: Identifying the state-of-the-art and the best conditions for the use of each model

机译:时间序列预测统计和机器学习模型的评估:识别最先进的和使用每个模型的最佳条件

获取原文
获取原文并翻译 | 示例
           

摘要

The choice of the most promising algorithm to model and predict a particular phenomenon is one of the most prominent activities of the temporal data forecasting. Forecasting (or prediction), similarly to other data mining tasks, uses empirical evidence to select the most suitable model for a problem at hand since no modeling method can be considered as the best. However, according to our systematic literature review of the last decade, few scientific publications rigorously expose the benefits and limitations of the most popular algorithms for time series prediction. At the same time, there is a limited performance record of these models when applied to complex and highly nonlinear data. In this paper, we present one of the most extensive, impartial and comprehensible experimental evaluations ever done in the time series prediction field. From 95 datasets, we evaluate eleven predictors, seven parametric and four non-parametric, employing two multi-step-ahead projection strategies and four performance evaluation measures. We report many lessons learned and recommendations concerning the advantages, drawbacks, and the best conditions for the use of each model. The results show that SARIMA is the only statistical method able to outperform, but without a statistical difference, the following machine learning algorithms: ANN, SVM, and kNN-TSPI. However, such forecasting accuracy comes at the expense of a larger number of parameters. The evaluated datasets, as well detailed results achieved by different indexes as MSE, Theil's U coefficient, POCID, and a recently-proposed multi-criteria performance measure are available online in our repository. Such repository is another contribution of this paper since other researchers can replicate our results and evaluate their methods more rigorously. The findings of this study will impact further research on this topic since they provide a broad insight into models selection, parameters setting, evaluation measures, and experimenta
机译:选择最有前途的模型和预测特定现象的算法是时间数据预测最突出的活动之一。与其他数据挖掘任务类似,预测(或预测)使用经验证据来选择手头的问题的最合适的模型,因为没有建模方法可以被认为是最好的。然而,根据我们过去十年的系统文献综述,很少有科学出版物严格地暴露了时间序列预测最受欢迎的算法的益处和局限性。同时,当应用于复杂和高度非线性数据时,这些模型的性能记录有限。在本文中,我们提出了在时间序列预测领域中完成的最广泛,最公正和可理解的实验评估之一。从95个数据集中,我们评估11个预测器,七个参数和四个非参数,采用两个多步预测策略和四种性能评估措施。我们报告了许多经验教训和关于各种模型使用的优势,缺点和最佳条件的教训。结果表明,Sarima是唯一能够胜过的统计方法,但没有统计差异,以下机器学习算法:ANN,SVM和KNN-TSPI。然而,这种预测精度牺牲了更多的参数。作为MSE的不同索引实现的评估的数据集以及通过MSE的不同索引,POCID和最近提出的多标准性能测量可以在我们的存储库中提供详细的结果。此类存储库是本文的另一个贡献,因为其他研究人员可以重复我们的结果并更加严格地评估它们的方法。本研究的调查结果将对这一主题产生进一步的研究,因为它们提供了广泛的洞察模型选择,参数设置,评估措施和实验

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号