首页> 外文期刊>Information Systems >Efficient and flexible algorithms for monitoring distance-based outliers over data streams
【24h】

Efficient and flexible algorithms for monitoring distance-based outliers over data streams

机译:高效灵活的算法,用于监视数据流中基于距离的离群值

获取原文
获取原文并翻译 | 示例
       

摘要

Anomaly detection is considered an important data mining task, aiming at the discovery of elements (known as outliers) that show significant diversion from the expected case. More specifically, given a set of objects the problem is to return the suspicious objects that deviate significantly from the typical behavior. As in the case of clustering, the application of different criteria leads to different definitions for an outlier. In this work, we focus on distance-based outliers: an object x is an outlier if there are less than k objects lying at distance at most R from x. The problem offers significant challenges when a stream-based environment is considered, where data arrive continuously and outliers must be detected on-the-fly. There are a few research works studying the problem of continuous outlier detection. However, none of these proposals meets the requirements of modern stream-based applications for the following reasons: (i) they demand a significant storage overhead, (ii) their efficiency is limited and (iii) they lack flexibility in the sense that they assume a single configuration of the k and R parameters. In this work, we propose new algorithms for continuous outlier monitoring in data streams, based on sliding windows. Our techniques are able to reduce the required storage overhead, are more efficient than previously proposed techniques and offer significant flexibility with regard to the input parameters. Experiments performed on real-life and synthetic data sets verify our theoretical study. (C) 2015 Elsevier Ltd. All rights reserved.
机译:异常检测被认为是一项重要的数据挖掘任务,其目的是发现显示出与预期情况有明显不同的元素(称为异常值)。更具体地说,给定一组对象,问题是返回与典型行为有明显偏离的可疑对象。与聚类一样,应用不同的标准会导致离群值的不同定义。在本文中,我们关注于基于距离的离群值:如果与x的距离R最多不超过k个,则对象x是一个离群值。当考虑到基于流的环境时,该问题提出了严峻的挑战,在这种环境中,数据不断到达,并且必须实时检测异常值。有一些研究工作正在研究连续离群值检测的问题。但是,由于以下原因,这些提议都不符合现代流应用程序的要求:(i)它们需要大量的存储开销,(ii)其效率受到限制,并且(iii)在假定它们的意义上缺乏灵活性k和R参数的单一配置。在这项工作中,我们提出了基于滑动窗口的,用于数据流中连续异常值监视的新算法。我们的技术能够减少所需的存储开销,比以前提出的技术更有效,并且在输入参数方面具有显着的灵活性。在现实生活和综合数据集上进行的实验验证了我们的理论研究。 (C)2015 Elsevier Ltd.保留所有权利。

著录项

  • 来源
    《Information Systems》 |2016年第1期|37-53|共17页
  • 作者单位

    Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece;

    Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece;

    Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece;

    Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece;

    Aristotle Univ Thessaloniki, Dept Informat, Thessaloniki 54124, Greece;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Stream data mining; Outlier detection;

    机译:流数据挖掘;离群值检测;
  • 入库时间 2022-08-18 02:47:47

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号