首页> 外文会议>Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Jul 23-26, 2002, Edmonton >On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration
【24h】

On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

机译:关于时间序列数据挖掘基准的需求:一项调查和经验论证

获取原文

摘要

In the last decade there has been an explosion of interest in mining time series data. Literally hundreds of papers have introduced new algorithms to index, classify, cluster and segment time series. In this work we make the following claim. Much of this work has very little utility because the contribution made (speed in the case of indexing, accuracy in the case of classification and clustering, model accuracy in the case of segmentation) offer an amount of "improvement" that would have been completely dwarfed by the variance that would have been observed by testing on many real world datasets, or the variance that would have been observed by changing minor (unstated) implementation details. To illustrate our point, we have undertaken the most exhaustive set of time series experiments ever attempted, re-implementing the contribution of more than two dozen papers, and testing them on 50 real world, highly diverse datasets. Our empirical results strongly support our assertion, and suggest the need for a set of time series benchmarks and more careful empirical evaluation in the data mining community.
机译:在过去的十年中,对挖掘时间序列数据的兴趣激增。从字面上看,数百篇论文引入了新算法来对时间序列进行索引,分类,聚类和分段。在这项工作中,我们提出以下主张。由于所做的贡献(索引时的速度,分类和聚类时的准确度,分段时的模型准确度)提供了一些“改进”,而这些贡献几乎是微不足道的,因此这项工作几乎没有什么用处。通过在许多真实世界的数据集上进行测试所观察到的差异,或通过更改次要(未声明的)实现细节所观察到的差异。为了说明我们的观点,我们进行了有史以来最详尽的时间序列实验,重新实现了两打以上论文的贡献,并在50个现实世界中高度多样化的数据集上进行了测试。我们的经验结果强烈支持我们的主张,并建议在数据挖掘社区中需要一组时间序列基准和更仔细的经验评估。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号