...
首页> 外文期刊>Data mining and knowledge discovery >Optimizing dynamic time warping's window width for time series data mining applications
【24h】

Optimizing dynamic time warping's window width for time series data mining applications

机译:优化动态时间翘曲的窗口宽度,用于时间序列数据挖掘应用程序

获取原文
获取原文并翻译 | 示例
           

摘要

Dynamic Time Warping (DTW) is a highly competitive distance measure for most time series data mining problems. Obtaining the best performance from DTW requires setting its only parameter, the maximum amount of warping (w). In the supervised case with ample data, w is typically set by cross-validation in the training stage. However, this method is likely to yield suboptimal results for small training sets. For the unsupervised case, learning via cross-validation is not possible because we do not have access to labeled data. Many practitioners have thus resorted to assuming that "the larger the better", and they use the largest value of w permitted by the computational resources. However, as we will show, in most circumstances, this is a na ve approach that produces inferior clusterings. Moreover, the best warping window width is generally non-transferable between the two tasks, i.e., for a single dataset, practitioners cannot simply apply the best w learned for classification on clustering or vice versa. In addition, we will demonstrate that the appropriate amount of warping not only depends on the data structure, but also on the dataset size. Thus, even if a practitioner knows the best setting for a given dataset, they will likely be at a lost if they apply that setting on a bigger size version of that data. All these issues seem largely unknown or at least unappreciated in the community. In this work, we demonstrate the importance of setting DTW's warping window width correctly, and we also propose novel methods to learn this parameter in both supervised and unsupervised settings. The algorithms we propose to learn w can produce significant improvements in classification accuracy and clustering quality. We demonstrate the correctness of our novel observations and the utility of our ideas by testing them with more than one hundred publicly available datasets. Our forceful results allow us to make a perhaps unexpected claim; an underappreciated "low hanging frui
机译:动态时间翘曲(DTW)是大多数时间序列数据挖掘问题的高竞争距离措施。从DTW获取最佳性能需要设置其唯一参数,最大的翘曲量(W)。在具有充分数据的监督案例中,W通常通过培训阶段的交叉验证设置。但是,这种方法可能会给小型训练集产生次优效果。对于无监督的情况,通过交叉验证学习是不可能的,因为我们无法访问标记数据。因此,许多从业者都采取了假设“越好”,他们使用计算资源允许的最大值。但是,在大多数情况下,我们将显示,这是一个na& ve方法产生劣质群集。此外,最佳的翘曲窗口宽度通常是不可转换的,即对于单个数据集,从业者不能简单地应用于对聚类的分类,反之亦然。此外,我们将证明适当的扭曲不仅取决于数据结构,还依赖于数据集大小。因此,即使从业者知道给定数据集的最佳设置,如果在更大尺寸版本的该数据上应用该设置,它们可能会丢失。所有这些问题似乎在很大程度上未知或在社区中至少被解释。在这项工作中,我们证明了正确设置DTW的翘曲窗口宽度的重要性,我们还提出了在监督和无监督的设置中学习此参数的新方法。我们建议学习W的算法可以显着改善分类准确性和聚类质量。我们展示了我们的小说观测和我们想法的实用性来证明我们的想法的实用性通过以多百以上的公共数据集进行测试。我们的有力结果允许我们提出意想不到的索赔;一个被批评的“低悬垂的Frui

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号