...
首页> 外文期刊>Open Journal of Statistics >Using Boosted Regression Trees and Remotely Sensed Data to Drive Decision-Making
【24h】

Using Boosted Regression Trees and Remotely Sensed Data to Drive Decision-Making

机译:使用升压回归树和远程感测数据来驱动决策

获取原文

摘要

Challenges in Big Data analysis arise due to the way the data are recorded, maintained, processed and stored. We demonstrate that a hierarchical, multivariate, statistical machine learning algorithm, namely Boosted Regression Tree (BRT) can address Big Data challenges to drive decision making. The challenge of this study is lack of interoperability since the data, a collection of GIS shapefiles, remotely sensed imagery, and aggregated and interpolated spatio-temporal information, are stored in monolithic hardware components. For the modelling process, it was necessary to create one common input file. By merging the data sources together, a structured but noisy input file, showing inconsistencies and redundancies, was created. Here, it is shown that BRT can process different data granularities, heterogeneous data and missingness. In particular, BRT ha style="font-family:Verdana;">s style="font-family:Verdana;"> the advantage of dealing with missing data by default by allowing a split on whether or not a value is missing as well as what the value is. Most importantly, the BRT offers a wide range of possibilities regarding the interpretation of results and variable selection is automatically performed by considering how frequently a variable is used to define a split in the tree. A comparison with two similar regression models (Random Forests and Least Absolute Shrinkage and Selection Operator, LASSO) show style="font-family:Verdana;">s style="font-family:Verdana;"> that BRT outperforms these in this instance. BRT can also be a starting point for sophisticated hierarchical modelling in real world scenarios. For example, a single or ensemble approach of BRT could be tested with existing models in order to improve results for a wide range of data-driven decisions and applications.
机译:由于记录,维护,处理和存储的数据的方式,由于记录,维护,处理和存储而产生的大数据分析中的挑战出现。我们证明了分层,多变量,统计机器学习算法,即升压回归树(BRT)可以解决推动决策的大数据挑战。该研究的挑战是缺乏互操作性,因为数据,GIS Shapefiles的集合,远程感测图像和聚合和内插时空信息,存储在单片硬件组件中。对于建模过程,有必要创建一个公共输入文件。通过将数据源合并在一起,创建了一个结构化但嘈杂的输入文件,显示了不一致和冗余。这里,显示BRT可以处理不同的数据粒度,异构数据和缺失。特别是,BRT ha style =“font-family:verdana;”> s style =“font-family:verdana;”>默认情况下允许分开处理缺失数据的优势是否缺少值以及该值的值。最重要的是,BRT提供了关于结果解释的广泛可能性,并且通过考虑使用变量的频率来定义树中的拆分来自动执行变量选择。与两个类似的回归模型(随机森林和最低绝对收缩和选择运营商,套索)显示 Style =“font-family:verdana;”> s style =“font-家庭:Verdana;“>在这种情况下,BRT优于这些。 BRT也可以是真实世界场景中复杂的等级建模的起点。例如,BRT的单个或集合方法可以用现有模型进行测试,以便改进各种数据驱动的决策和应用程序的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号