Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. In time series prediction and other related tasks, it has been noted that missing values and their missing patterns are often correlated with the target labels, a.k.a., informative missingness. There is very limited work on exploiting the missing patterns for effective imputation and improving prediction performance. In this paper, we develop novel deep learning models, namely >GRU->D, as one of the early attempts. GRU-D is based on Gated Recurrent Unit (GRU), a state-of-the-art recurrent neural network. It takes two representations of missing patterns, i.e., masking and time interval, and effectively incorporates them into a deep model architecture so that it not only captures the long-term temporal dependencies in time series, but also utilizes the missing patterns to achieve better prediction results. Experiments of time series classification tasks on real-world clinical datasets (MIMIC-III, PhysioNet) and synthetic datasets demonstrate that our models achieve state-of-the-art performance and provide useful insights for better understanding and utilization of missing values in time series analysis.
展开▼
机译:实际应用中的多元时间序列数据(例如医疗保健,地球科学和生物学)的特征在于各种缺失值。在时间序列预测和其他相关任务中,已经注意到缺失值及其缺失模式通常与目标标签(也就是信息缺失)相关。利用缺失的模式进行有效插补和改善预测性能的工作非常有限。在本文中,我们开发了新颖的深度学习模型,即> GRU strong>-> D strong>,作为早期尝试之一。 GRU-D基于门控循环单元(GRU),这是一种最新的循环神经网络。它采用丢失模式的两种表示形式,即掩码和时间间隔,并将它们有效地合并到深度模型体系结构中,这样它不仅可以捕获时间序列中的长期时间依赖性,还可以利用丢失的模式来实现更好的预测结果。在现实世界的临床数据集(MIMIC-III,PhysioNet)和合成数据集上进行时间序列分类任务的实验表明,我们的模型达到了最先进的性能,并提供了有用的见解,可以更好地理解和利用时间序列中的缺失值分析。
展开▼