首页> 外文期刊>Machine Learning >Optimizing regression models for data streams with missing values
【24h】

Optimizing regression models for data streams with missing values

机译:优化具有缺失值的数据流的回归模型

获取原文
获取原文并翻译 | 示例

摘要

Automated data acquisition systems, such as wireless sensor networks, surveillance systems, or any system that records data in operating logs, are becoming increasingly common, and provide opportunities for making decision on data in real or nearly real time. In these systems, data is generated continuously resulting in a stream of data, and predictive models need to be built and updated online with the incoming data. In addition, the predictive models need to be able to output predictions continuously, and without delays. Automated data acquisition systems are prone to occasional failures. As a result, missing values may often occur. Nevertheless, predictions need to be made continuously. Hence, predictive models need to have mechanisms for dealing with missing data in such a way that the loss in accuracy due to occasionally missing values would be minimal. In this paper, we theoretically analyze effects of missing values to the accuracy of linear predictive models. We derive the optimal least squares solution that minimizes the expected mean squared error given an expected rate of missing values. Based on this theoretically optimal solution we propose a recursive algorithm for producing and updating linear regression online, without accessing historical data. Our experimental evaluation on eight benchmark datasets and a case study in environmental monitoring with streaming data validate the theoretical results and confirm the effectiveness of the proposed strategy.
机译:自动化的数据采集系统(例如无线传感器网络,监视系统或将数据记录在操作日志中的任何系统)变得越来越普遍,并为实时或近乎实时地做出数据决策提供了机会。在这些系统中,数据是连续生成的,从而产生了数据流,并且需要建立预测模型,并使用输入数据在线更新模型。另外,预测模型需要能够连续输出预测,并且没有延迟。自动化数据采集系统容易出现偶然故障。结果,经常会出现缺失值。然而,需要不断做出预测。因此,预测模型需要具有处理丢失数据的机制,以使由于偶尔丢失值而导致的准确性损失最小。在本文中,我们从理论上分析了缺失值对线性预测模型准确性的影响。我们给出了最佳最小二乘解,它给出了预期的缺失值比率,从而使预期的均方误差最小。基于这一理论上最优的解决方案,我们提出了一种无需访问历史数据即可在线生成和更新线性回归的递归算法。我们对八个基准数据集的实验评估以及以流数据进行环境监测的案例研究验证了理论结果并证实了所提出策略的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号