一种基于Shapelets的懒惰式时间序列分类算法

王志海; 张伟; 原继东; 刘海洋

摘要

近些年,时间序列分类问题研究受到了越来越多的关注.基于shapelets的时间序列分类技术是一种有效的方法.然而,其在提取最优shapelet的过程中要建立包含大量冗余元素的候选shapelets集合,一般所获得的shapelets只在平均意义上具有某种鉴别性;与此同时,普通模型往往忽略了待分类实例所具有的局部特征.为此,我们提出了一种依据待分类实例显著局部特征的懒惰式分类模型.这种模型为每个待分类实例构建各自的数据驱动的懒惰式shapelets分类模型,从而逐步缩小了与其分类相关的时间序列搜索空间,使得所获得的shapelets能够直接反映待分类实例的显著局部特征.实验结果表明该文提出的模型具有较高的准确率和更强的可解释性.%In order to discover the characteristics of data and explain the prediction process of classification model, the study of interpretable model has become increasingly prevalent in recent years.In reality, we can get massive time series data in many fields, such as weather forecast, medical monitoring, and anomaly detection.Time series classification is an important research field of time series data mining.Time series is different from the traditional attribute vector data, and it has no explicit attributes.Even with the sophisticated feature selection techniques, the dimensionality of potential feature space is still beyond the acceptable range.This poses a challenge to learn an accurate classification model with strong interpretability.Since shapelet is a new primitive that can be used to construct interpretable model, time series classification based on shapelet has recently attracted considerable interest.Shapelet-based classification algorithm is a typical shapebased algorithm.Shapelet can help us give a high sight on the local discriminative features of time series.According to the usage of shapelet, the shapelet-based models can be divided into two categories.One type method establishes a much smaller yet more discriminative feature set through the top-kshapelets to transform the origin dataset.Furthermore, traditional classification algorithms can be applied on the converted low-dimensional dataset.The other employs selected shapelets to build the classification model directly.However, these global shapelet-based models have some obvious shortcomings.First, the global model always needs to create a candidate shapelet set which contains a large number of redundant elements in the process of extracting the best shapelet.Due to the impact of redundant instances and intra-class variation, the extracted shapelets are merely good for the training instances in the average sense.The established shapeletbased model may not be suitable and efficient for the test cases.Second, the shapelets obtained may be from different instances or approximate solutions, which cannot indicate the local characteristics of the test case exactly.Third, since the class value of the local features from the test case is unknown, the characteristics of test cases are always ignored.In order to solve the above problems, a data driven local model based on shapelets for each test case is proposed.In our model, instead of global similarity, local similarity is considered as the basis for classification.The local features of the test case are evaluated directly to find the most discriminative shapelet.And then the shapelet is used to reduce the searching space of class attribute value progressively.Since the shapelets are from the test example, they directly reflect the salient local features of the test case and can answer the question why the model assigns a certain class value to the instance.Meanwhile, in the shapelet evaluation progress, instances are selected to reduce the impact of redundant instances and intra-class variation.The lazy classification model presented in this paper is compared with two shapelet decision tree models, 1NN models based on different distance functions, and C4.5models based on different top-k shapelets transformation algorithms. Experimental results show that the proposed model has higher accuracy and stronger interpretability.

一种基于Shapelets的懒惰式时间序列分类算法

摘要

著录项

相似文献

相关主题

期刊订阅