...
【24h】

Optimizing shapelets quality measure for imbalanced time series classification

机译:优化Shapelets Mubalanced Time Series分类的质量措施

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Time series classification has been considered as one of the most challenging problems in data mining and is widely used in a broad range of fields. A biased distribution leads to classification on minority time series objects more severe. A commonly taken approach is to extract or select the representative features to retain the structure of a time series object. However, when the data distribution is imbalanced, the traditional features cannot represent time series effectively, especially in multi-class environment. In this paper, Shapelets - a primitive time series mining technology - is applied to extract the most representative subsequences. Especially, we verify that IG (Information Gain) is unsuitable as a shapelet quality measure for imbalanced data sets. Nevertheless, we propose two quality measures for shapelets on imbalanced binary and multi-class problem respectively. Based on extracted shapelet features, we select the diversified top-k shapelets based on new quality measure to represent the top-k best features and achieve this procedure on map-reduce framework. Lastly, two oversampling methods based on shapelet features are proposed to re-balance the binary and multi-class time series data sets. We validated our methods on the benchmark data sets by comparing with the canonical classifiers and the state-of-the-art time series algorithms. It is verified that the proposed algorithms perform more competitive than the compared methods in statistical significance.
机译:时间序列分类被认为是数据挖掘中最具挑战性问题之一,并且广泛用于广泛的领域。偏见的分布导致少数群体时间序列对象的分类更严重。常用的方法是提取或选择代表特征以保留时间序列对象的结构。但是,当数据分布不平衡时,传统的功能不能有效地表示时间序列,尤其是在多级环境中。在本文中,Shapelets - 原始时间级挖掘技术 - 用于提取最具代表性的子序列。特别是,我们验证IG(信息增益)不适合作为不平衡数据集的ShapLet质量测量。尽管如此,我们分别为不平衡二元和多级问题提出了两种质量措施。基于提取的ShapLet特征,我们根据新的质量措施选择多样化的Top-k Shoadelets,以表示顶级最佳功能,并在地图减少框架上实现此过程。最后,提出了两个基于ShapLet特征的过采样方法来重新平衡二进制和多级时间序列数据集。通过与规范分类器和最先进的时间序列算法进行比较,我们通过比较验证了基准数据集的方法。验证所提出的算法比统计显着性的比较方法更具竞争力。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号