首页> 外文会议>IEEE International Conference on Data Mining >Generating Synthetic Time Series to Augment Sparse Datasets
【24h】

Generating Synthetic Time Series to Augment Sparse Datasets

机译:生成合成时间序列以增强稀疏数据集

获取原文

摘要

In machine learning, data augmentation is the process of creating synthetic examples in order to augment a dataset used to learn a model. One motivation for data augmentation is to reduce the variance of a classifier, thereby reducing error. In this paper, we propose new data augmentation techniques specifically designed for time series classification, where the space in which they are embedded is induced by Dynamic Time Warping (DTW). The main idea of our approach is to average a set of time series and use the average time series as a new synthetic example. The proposed methods rely on an extension of DTW Barycentric Averaging (DBA), the averaging technique that is specifically developed for DTW. In this paper, we extend DBA to be able to calculate a weighted average of time series under DTW. In this case, instead of each time series contributing equally to the final average, some can contribute more than others. This extension allows us to generate an infinite number of new examples from any set of given time series. To this end, we propose three methods that choose the weights associated to the time series of the dataset. We carry out experiments on the 85 datasets of the UCR archive and demonstrate that our method is particularly useful when the number of available examples is limited (e.g. 2 to 6 examples per class) using a 1-NN DTW classifier. Furthermore, we show that augmenting full datasets is beneficial in most cases, as we observed an increase of accuracy on 56 datasets, no effect on 7 and a slight decrease on only 22.
机译:在机器学习中,数据扩充是创建综合示例以扩充用于学习模型的数据集的过程。数据扩充的一种动机是减少分类器的方差,从而减少错误。在本文中,我们提出了专门为时间序列分类设计的新数据增强技术,其中嵌入它们的空间是由动态时间规整(DTW)引起的。我们方法的主要思想是平均一组时间序列,并将平均时间序列用作新的综合示例。所提出的方法依赖于DTW重心平均(DBA)的扩展,DBA是专为DTW开发的平均技术。在本文中,我们扩展了DBA以能够计算DTW下时间序列的加权平均值。在这种情况下,某些时间序列可以比其他时间序列贡献更多,而不是每个时间序列对最终平均值的贡献均相等。此扩展使我们可以从给定时间序列的任何集合中生成无限数量的新示例。为此,我们提出了三种选择与数据集的时间序列关联的权重的方法。我们对UCR档案的85个数据集进行了实验,并证明了当使用1-NN DTW分类器来限制可用示例的数量(例如,每个类别2至6个示例)时,我们的方法特别有用。此外,我们发现在大多数情况下,扩充完整数据集是有益的,因为我们观察到56个数据集的准确性有所提高,对7个数据集没有影响,而对22个数据集则略有下降。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号