首页> 外文会议>11th IEEE International Conference on Data Mining >Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data
【24h】

Time Series Epenthesis: Clustering Time Series Streams Requires Ignoring Some Data

机译:时间序列理论:将时间序列流聚类需要忽略一些数据

获取原文
获取原文并翻译 | 示例

摘要

Given the pervasiveness of time series data in all human endeavors, and the ubiquity of clustering as a data mining application, it is somewhat surprising that the problem of time series clustering from a single stream remains largely unsolved. Most work on time series clustering considers the clustering of individual time series, e.g., gene expression profiles, individual heartbeats or individual gait cycles. The few attempts at clustering time series streams have been shown to be objectively incorrect in some cases, and in other cases shown to work only on the most contrived datasets by carefully adjusting a large set of parameters. In this work, we make two fundamental contributions. First, we show that the problem definition for time series clustering from streams currently used is inherently flawed, and a new definition is necessary. Second, we show that the Minimum Description Length (MDL) framework offers an efficient, effective and essentially parameter-free method for time series clustering. We show that our method produces objectively correct results on a wide variety of datasets from medicine, zoology and industrial process analyses.
机译:考虑到时间序列数据在所有人类活动中的普遍性,以及将聚类作为数据挖掘应用程序的普遍性,令人惊讶的是,单个流中的时间序列聚类问题仍未解决。时间序列聚类的大多数工作都考虑单个时间序列的聚类,例如基因表达谱,单个心跳或单个步态周期。在某些情况下,已证明对时间序列流进行聚类的几次尝试在客观上是不正确的,而在其他情况下,通过仔细调整大量参数,这些结果仅对最人为设计的数据集有效。在这项工作中,我们做出了两个基本贡献。首先,我们表明从当前使用的流对时间序列聚类进行问题定义固有地存在缺陷,因此有必要提供新的定义。其次,我们表明最小描述长度(MDL)框架为时间序列聚类提供了一种有效,有效且基本无参数的方法。我们表明,我们的方法在医学,动物学和工业过程分析的各种数据集上产生客观正确的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号