Efficient and flexible algorithms for monitoring distance-based outliers over data streams

Kontaki Maria; Gounaris Anastasios; Papadopoulos Apostolos N.; Tsichlas Kostas; Manolopoulos Yannis

首页> 外文期刊>Information Systems >Efficient and flexible algorithms for monitoring distance-based outliers over data streams

【24h】

Efficient and flexible algorithms for monitoring distance-based outliers over data streams

机译：高效灵活的算法，用于监视数据流中基于距离的离群值

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Anomaly detection is considered an important data mining task, aiming at the discovery of elements (known as outliers) that show significant diversion from the expected case. More specifically, given a set of objects the problem is to return the suspicious objects that deviate significantly from the typical behavior. As in the case of clustering, the application of different criteria leads to different definitions for an outlier. In this work, we focus on distance-based outliers: an object x is an outlier if there are less than k objects lying at distance at most R from x. The problem offers significant challenges when a stream-based environment is considered, where data arrive continuously and outliers must be detected on-the-fly. There are a few research works studying the problem of continuous outlier detection. However, none of these proposals meets the requirements of modern stream-based applications for the following reasons: (i) they demand a significant storage overhead, (ii) their efficiency is limited and (iii) they lack flexibility in the sense that they assume a single configuration of the k and R parameters. In this work, we propose new algorithms for continuous outlier monitoring in data streams, based on sliding windows. Our techniques are able to reduce the required storage overhead, are more efficient than previously proposed techniques and offer significant flexibility with regard to the input parameters. Experiments performed on real-life and synthetic data sets verify our theoretical study. (C) 2015 Elsevier Ltd. All rights reserved.

机译：异常检测被认为是一项重要的数据挖掘任务，其目的是发现显示出与预期情况有明显不同的元素（称为异常值）。更具体地说，给定一组对象，问题是返回与典型行为有明显偏离的可疑对象。与聚类一样，应用不同的标准会导致离群值的不同定义。在本文中，我们关注于基于距离的离群值：如果与x的距离R最多不超过k个，则对象x是一个离群值。当考虑到基于流的环境时，该问题提出了严峻的挑战，在这种环境中，数据不断到达，并且必须实时检测异常值。有一些研究工作正在研究连续离群值检测的问题。但是，由于以下原因，这些提议都不符合现代流应用程序的要求：（i）它们需要大量的存储开销，（ii）其效率受到限制，并且（iii）在假定它们的意义上缺乏灵活性k和R参数的单一配置。在这项工作中，我们提出了基于滑动窗口的，用于数据流中连续异常值监视的新算法。我们的技术能够减少所需的存储开销，比以前提出的技术更有效，并且在输入参数方面具有显着的灵活性。在现实生活和综合数据集上进行的实验验证了我们的理论研究。（C）2015 Elsevier Ltd.保留所有权利。

著录项

来源
《Information Systems》 |2016年第1期|37-53|共17页
作者
Kontaki Maria; Gounaris Anastasios; Papadopoulos Apostolos N.; Tsichlas Kostas; Manolopoulos Yannis;
展开▼
作者单位

Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece;

Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece;

Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece;

Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece;

Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Stream data mining; Outlier detection;

机译：流数据挖掘;离群值检测;
入库时间 2022-08-18 02:47:47

相似文献

外文文献
中文文献
专利

1. Distance-based outlier queries in data streams: the novel task and algorithms [J] . Angiulli F, Fassetti F Data mining and knowledge discovery . 2010,第2期

机译：数据流中基于距离的离群值查询：新颖的任务和算法
2. Efficient Approaches for Continuous Monitoring Attribute Outlier Over Data Streams [J] . Hui Cao, Gang Chen Advanced Science Letters . 2012,第Null期

机译：连续监视数据流中的属性异常值的有效方法
3. A Review of Local Outlier Factor Algorithms for Outlier Detection in Big Data Streams [J] . Alghushairy Omar, Alsini Raed, Soule Terence, Big Data and Cognitive Computing . 2021,第1期

机译：大数据流中的远离异常值检测的局部异常因素算法综述
4. An efficient FPGA implementation of Mahalanobis distance-based outlier detection for streaming data [C] . Yuto Arai, Shinichi Wakabayashi, Shinobu Nagayama, International Conference on Field-Programmable Technology . 2016

机译：基于Mahalanobis基于距离的离群值检测的高效FPGA实现流数据
5. A Genetic Algorithm-Based Local Outlier Factor for Efficient Big Data Stream Processing [D] . Alghushairy, Omar S. 2021

机译：基于遗传算法的基于遗传算法的有效大数据流处理的本地异常因素
6. Designing a Streaming Algorithm for Outlier Detection in Data Mining—An Incremental Approach [O] . Kangqing Yu, Wei Shi, Nicola Santoro 2020

机译：设计用于数据挖掘中异常值检测的流算法—一种增量方法
7. Continuous Monitoring of Distance-Based Outliers over Data Streams [O] . Maria Kontaki, Anastasios Gounaris, Apostolos N. Papadopoulos, 2012

机译：基于数据流的基于距离的异常值的连续监视

Efficient and flexible algorithms for monitoring distance-based outliers over data streams

摘要

著录项

相似文献

相关主题

期刊订阅