DBSCOUT: A Density-based Method for Scalable Outlier Detection in Very Large Datasets

机译：DBSCOUT：基于密度的方法，用于在非常大的数据集中可扩展的异常检测方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recent technological advancements have enabled generating and collecting huge amounts of data in a daily manner. This data is used for different purposes that may impact us on an unprecedented scale. Understanding the data, including detecting its outliers, is a critical step before utilizing it.Outlier detection has been studied well in the literature but the existing approaches fail to scale to these very large settings. In this paper, we propose DBSCOUT, an efficient exact algorithm for outlier detection with a linear complexity that can run in parallel over multiple independent machines, making it a fit for the settings with billions of tuples. Besides the theoretical analysis, our experiment results confirm orders of magnitude improvement over the existing work, proving the efficiency, scalability, and effectiveness of our approach.

机译：最近的技术进步使得能够以日常方式产生和收集大量数据。此数据用于不同的目的，可能会对我们的前所未有的规模影响。了解包括检测到其异常值的数据是在利用它之前的关键步骤。在文献中已经很好地研究了更好的检测，但现有方法无法扩展到这些非常大的设置。在本文中，我们提出了DBSCOUT，一种有效的精确算法，用于具有线性复杂度的异常复杂性，可以通过多个独立机器并行运行，使其适用于数十亿元组。除了理论分析外，我们的实验结果还确认了现有工作的数量级，证明了我们方法的效率，可扩展性和有效性。

著录项

来源
《International Conference on Data Engineering》|2021年|37-48|共12页
会议地点
作者
Matteo Corain; Paolo Garza; Abolfazl Asudeh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Three-dimensional displays; Scalability; Conferences; Data engineering; Complexity theory; Proposals; Time complexity;

机译：三维显示器;可扩展性;会议;数据工程;复杂性理论;提案;时间复杂性;

相似文献

外文文献
中文文献
专利

1. 一种基于深度混合密度网络的航空器轨迹异常检测方法 [J] . 陈丽晶, 曾维理, 羊钊南京航空航天大学学报（英文版） . 2021,第005期
2. SDCOR: Scalable density-based clustering for local outlier detection in massive-scale datasets [J] . Nozad Sayyed Ahmad Naghavi, Haeri Maryam Amir, Folino Gianluigi Knowledge-Based Systems . 2021,第Sepa27期

机译：SDCOR：基于尺寸的基于密度的基于密度的聚类，用于大规模数据集中的本地异常检测
3. Applying density-based outlier identifications using multiple datasets for validation of stroke clinical outcomes [J] . Lin Ching-Heng, Hsu Kai-Cheng, Johnson Kory R., International journal of medical informatics . 2019,第Deca期

机译：使用多个数据集应用基于密度的离群值标识来验证中风临床结果
4. Evaluation of Multivariate Outlier Detection Methods with Benchmark Medical Datasets [J] . Zahra Nazari, Dongshik Kang International journal of computer science and network security . 2018,第4期

机译：使用基准医学数据集评估多元离群值检测方法
5. Automatic Construction of Action Datasets Using Web Videos with Density-Based Cluster Analysis and Outlier Detection [C] . Nga Hang Do, Keiji Yanai Pacific-rim symposium on image and video technology . 2016

机译：使用基于密度的聚类分析和异常值检测的网络视频自动构建动作数据集
6. Local parametric density-based outlier detection and ensemble learning with applications to malware detection. [D] . Williams, Kristopher T. 2016

机译：基于局部参数密度的离群值检测和集成学习以及恶意软件检测应用程序。
7. Global Wheat Head Detection (GWHD) Dataset: A Large and Diverse Dataset of High-Resolution RGB-Labelled Images to Develop and Benchmark Wheat Head Detection Methods [O] . Etienne David, Simon Madec, Pouria Sadeghi-Tehran, 2020

机译：全球小麦头部检测（GWHD）数据集：高分辨率RGB标记图像的大型和多样化数据集用于开发和基准麦头检测方法
8. First evaluation of a novel screening tool for outlier detection in large scale ambient air quality datasets [O] . KRACHT OLIVER, REUTER Hannes I., GERBOLES Michel 2013

机译：对用于大规模环境空气质量数据集中异常检测的新型筛选工具的首次评估

DBSCOUT: A Density-based Method for Scalable Outlier Detection in Very Large Datasets

摘要

著录项

相似文献

相关主题

期刊订阅