首页> 外文会议>2011 IEEE Symposium on Computational Intelligence and Data Mining >Computational intelligence methods for processing misaligned, unevenly sampled time series containing missing data
【24h】

Computational intelligence methods for processing misaligned, unevenly sampled time series containing missing data

机译:用于处理包含丢失数据的未对齐,采样不均匀的时间序列的计算智能方法

获取原文

摘要

One consequence of the increasing amount of data stored during acquisition processes is that sampled time series are more prone to be collected in a misaligned uneven fashion and/or be partly lost or unavailable (missing data). Due to their severe impact on data mining techniques, this work proposes methods to (a) align misaligned unevenly sampled data, (b) differentiate absent values related to low sampling frequencies, compared to those resulting from missingness mechanisms, and (c) to classify recoverable and non-recoverable segments of missing data by using statistical and fuzzy modeling approaches. These methods were evaluated against randomly simulated test datasets containing different amounts of missing data. Results show that: (1) using the variable most frequently sampled as a template, combined with cubic interpolation, allowed to unshift misaligned uneven data without significant errors; (2) the differentiation of absent values due to low sampling frequencies from those truly missing, can be succesfully performed using 95% confidence intervals relative to the mean sampling time; (3) fuzzy modeling returned better classification results for recoverable segments, while the statistical approach performed better in classifying non-recoverable segments. All three methods proposed in this work decreased their performance when the amount of missing data was increased in the test datasets.
机译:采集过程中存储的数据量不断增加的一个结果是,采样的时间序列更容易以未对准的不均匀方式收集和/或部分丢失或不可用(丢失数据)。由于其对数据挖掘技术的严重影响,这项工作提出了以下方法:(a)对齐未对齐的不均匀采样数据;(b)区分与低采样频率相关的缺失值(与缺失机制产生的值相比);以及(c)进行分类通过使用统计和模糊建模方法,可以恢复丢失数据的可恢复段和不可恢复段。针对包含不同数量缺失数据的随机模拟测试数据集对这些方法进行了评估。结果表明:(1)使用最常采样的变量作为模板,结合三次插值,可以使未对齐的不均匀数据进行平移,而没有明显的误差; (2)可以使用相对于平均采样时间的95%置信区间成功执行因采样频率低而导致的缺失值与真正缺失值的区分。 (3)模糊建模对可恢复段的分类结果较好,而统计方法对不可恢复段的分类效果更好。当测试数据集中丢失的数据量增加时,本文提出的所有三种方法均会降低其性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号