首页> 外文会议>IEEE international conference on data engineering >Scalable distance-based outlier detection over high-volume data streams

【24h】

Scalable distance-based outlier detection over high-volume data streams

机译：在大容量数据流上可扩展的基于距离的离群值检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The discovery of distance-based outliers from huge volumes of streaming data is critical for modern applications ranging from credit card fraud detection to moving object monitoring. In this work, we propose the first general framework to handle the three major classes of distance-based outliers in streaming environments, including the traditional distance-threshold based and the nearest-neighbor-based definitions. Our LEAP framework encompasses two general optimization principles applicable across all three outlier types. First, our “minimal probing” principle uses a lightweight probing operation to gather minimal yet sufficient evidence for outlier detection. This principle overturns the state-of-the-art methodology that requires routinely conducting expensive complete neighborhood searches to identify outliers. Second, our “lifespan-aware prioritization” principle leverages the temporal relationships among stream data points to prioritize the processing order among them during the probing process. Guided by these two principles, we design an outlier detection strategy which is proven to be optimal in CPU costs needed to determine the outlier status of any data point during its entire life. Our comprehensive experimental studies, using both synthetic as well as real streaming data, demonstrate that our methods are 3 orders of magnitude faster than state-of-the-art methods for a rich diversity of scenarios tested yet scale to high dimensional streaming data.

机译：从大量流数据中发现基于距离的离群值对于从信用卡欺诈检测到移动对象监视等现代应用而言至关重要。在这项工作中，我们提出了第一个通用框架来处理流环境中基于距离的离群值的三大类，包括传统的基于距离阈值的定义和基于最近邻的定义。我们的LEAP框架包含适用于所有三种异常值类型的两项通用优化原则。首先，我们的“最小探测”原理使用轻量级的探测操作来收集最小但足够的证据来进行离群值检测。该原则推翻了需要定期进行昂贵的完整邻域搜索以识别异常值的最新方法。其次，我们的“可识别生命的优先级”原则利用了流数据点之间的时间关系，从而在探测过程中对它们之间的处理顺序进行了优先排序。在这两个原则的指导下，我们设计了一种异常值检测策略，该策略被证明是确定所有数据点在其整个生命周期内所需的异常值所需的CPU成本最佳的策略。我们对综合和真实流数据都进行了全面的实验研究，结果表明，对于经过测试的各种场景，我们的方法都比最先进的方法快3个数量级，但可以扩展到高维流数据。

著录项

来源
《IEEE international conference on data engineering 》|2014年|76-87|共12页
会议地点
作者
Cao Lei; Yang Di; Wang Qingyang; Yu Yanwei;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Scalable KDE-based top-n local outlier detection over large-scale data streams [J] . Liu Fang, Yu Yanwei, Song Peng, Knowledge-Based Systems . 2020 ,第Sepa27期

机译：基于KDE的基于KDE的POSE-N局部异常检测大规模数据流
2. Efficient and flexible algorithms for monitoring distance-based outliers over data streams [J] . Kontaki Maria, Gounaris Anastasios, Papadopoulos Apostolos N., Information Systems . 2016 ,第JANa期

机译：高效灵活的算法，用于监视数据流中基于距离的离群值
3. Distance-based outlier queries in data streams: the novel task and algorithms [J] . Angiulli F, Fassetti F Data mining and knowledge discovery . 2010 ,第2期

机译：数据流中基于距离的离群值查询：新颖的任务和算法
4. Scalable distance-based outlier detection over high-volume data streams [C] . Cao Lei, Yang Di, Wang Qingyang, IEEE international conference on data engineering . 2014

机译：基于距离的距离的异常探测在大容量数据流中
5. Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy. [D] . Zhang, Ji. 2009

机译：使用投影离群值分析策略实现对高维数据流的离群值检测。
6. Designing a Streaming Algorithm for Outlier Detection in Data Mining—An Incremental Approach [O] . Kangqing Yu, Wei Shi, Nicola Santoro 2020

机译：设计用于数据挖掘中异常值检测的流算法—一种增量方法
7. Data Editing Techniques to Allow the Application of Distance-Based Outlier Detection to Streams [O] . Vit Niennattrakul, Eamonn Keogh, Chotirat Ann Ratanamahatana 2011

机译：允许将基于距离的异常值检测应用于流的数据编辑技术

Scalable distance-based outlier detection over high-volume data streams

摘要

著录项

相似文献

相关主题

期刊订阅