In this paper, we present a new technique, called Stream Projected Outlier deTector (SPOT), to deal with outlier detection problem in high-dimensional data streams. SPOT is unique in a number of aspects. First, SPOT employs a novel window-based time model and decaying cell summaries to capture statistics from the data stream. Second, Sparse Subspace Template (SST), a set of top sparse subspaces obtained by unsupervised and/or supervised learning processes, is constructed in SPOT to detect projected outliers effectively. Multi-Objective Genetic Algorithm (MOGA) is employed as an effective search method in unsupervised learning for finding outlying subspaces from training data. Finally, SST is able to carry out online self-evolution to cope with dynamics of data streams. This paper provides details on the motivation and technical challenges of detecting outliers from high-dimensional data streams, present an overview of SPOT, and give the plans for system demonstration of SPOT.ud
展开▼
机译:在本文中,我们提出了一种新技术,称为流投影离群值检测器(SPOT),用于处理高维数据流中的离群值检测问题。 SPOT在许多方面都是独一无二的。首先,SPOT采用新颖的基于窗口的时间模型和衰减的单元摘要来捕获数据流中的统计信息。其次,在SPOT中构建稀疏子空间模板(SST),它是通过无监督和/或有监督的学习过程获得的一组顶部稀疏子空间,可以有效地检测投影的异常值。在无监督学习中,多目标遗传算法(MOGA)被用作一种有效的搜索方法,用于从训练数据中查找偏远子空间。最后,SST能够进行在线自演化以应对数据流的动态变化。本文详细介绍了从高维数据流中检测异常值的动机和技术挑战,介绍了SPOT,并给出了SPOT系统演示的计划。 ud
展开▼