首页> 外文会议>IEEE International Conference on Data Engineering >Neighbor Profile: Bagging Nearest Neighbors for Unsupervised Time Series Mining
【24h】

Neighbor Profile: Bagging Nearest Neighbors for Unsupervised Time Series Mining

机译:邻居资料:无监督时间序列挖掘将最近的邻居装袋

获取原文

摘要

Unsupervised time series mining has been attracting great interest from both academic and industrial communities. As the two most basic data mining tasks, the discoveries of frequent/rare subsequences have been extensively studied in the literature. Specifically, frequent/rare subsequences are defined as the ones with the smallest/largest 1-nearest neighbor distance, which are also known as motif/discord. However, discord fails to identify rare subsequences when it occurs more than once in the time series, which is widely known as the twin freak problem. This problem is just the "tip of the iceberg" due to the 1-nearest neighbor distance based definitions. In this work, we for the first time provide a clear theoretical analysis of motif/discord as the 1-nearest neighbor based nonparametric density estimation of subsequence. Particularly, we focus on matrix profile, a recently proposed mining framework, which unifies the discovery of motif and discord under the same computing model. Thereafter, we point out the inherent three issues: low-quality density estimation, gravity defiant behavior, and lack of reusable model, which deteriorate the performance of matrix profile in both efficiency and subsequence quality.To overcome these issues, we propose Neighbor Profile to robustly model the subsequence density by bagging nearest neighbors for the discovery of frequent/rare subsequences. Specifically, we leverage multiple subsamples and average the density estimations from subsamples using adjusted nearest neighbor distances, which not only enhances the estimation robustness but also realizes a reusable model for efficient learning. We check the sanity of neighbor profile on synthetic data and further evaluate it on real-world datasets. The experimental results demonstrate that neighbor profile can correctly model the subsequences of different densities and shows superior performance significantly over matrix profile on the real-world arrhythmia dataset. Also, it is shown that neighbor profile is efficient for massive datasets.
机译:无人监督的时间序列挖掘已经引起了学术界和工业界的极大兴趣。作为两个最基本的数据挖掘任务,频繁/稀有子序列的发现已在文献中进行了广泛的研究。具体地,频繁/稀有子序列被定义为具有最小/最大的1-最近邻居距离的子序列,也被称为主题/不一致。但是,在时间序列中多次发生不和谐时,discord无法识别稀有子序列,这被广泛称为双胞胎怪胎问题。由于基于1最近邻居距离的定义,此问题只是“冰山一角”。在这项工作中,我们首次提供了清晰的理论分析,作为基于1个最近邻的子序列非参数密度估计,对主题/不一致进行了理论分析。特别是,我们专注于矩阵配置文件,这是最近提出的挖掘框架,该框架在相同的计算模型下统一了主题和不一致的发现。此后,我们指出了固有的三个问题:低质量密度估计,重力违背行为和缺乏可重用模型,这会降低矩阵分布在效率和子序列质量方面的性能。为克服这些问题,我们提出了邻居分布来解决这些问题通过装袋最近的邻居以发现频繁/稀有子序列,对子序列密度进行稳健建模。具体来说,我们利用多个子样本并使用调整后的最近邻居距离对子样本的密度估计求平均值,这不仅增强了估计的鲁棒性,而且实现了可重用的模型以进行有效学习。我们检查合成数据上邻居配置文件的完整性,并在实际数据集上对其进行进一步评估。实验结果表明,邻域轮廓可以正确地模拟不同密度的子序列,并在实际心律不齐数据集上显示出明显优于矩阵轮廓的性能。此外,它表明邻居配置文件对于海量数据集是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号