A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data

Ninh Pham; Rasmus Pagh

首页> 外文期刊>SIGKDD explorations >A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data

【24h】

A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data

机译：高角度数据中基于角度的离群值检测的近线性时间近似算法

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Outlier mining in d-dimensional point sets is a fundamental and well studied data mining task due to its variety of applications. Most such applications arise in high-dimensional domains. A bottleneck of existing approaches is that implicit or explicit assessments on concepts of distance or nearest neighbor are deteriorated in high-dimensional data. Following up on the work of Kriegel et al. (KDD '08), we investigate the use of angle-based outlier factor in mining high-dimensional outliers. While their algorithm runs in cubic time (with a quadratic time heuristic), we propose a novel random projection-based technique that is able to estimate the angle-based outlier factor for all data points in time near-linear in the size of the data. Also, our approach is suitable to be performed in parallel environment to achieve a parallel speedup. We introduce a theoretical analysis of the quality of approximation to guarantee the reliability of our estimation algorithm. The empirical experiments on synthetic and real world data sets demonstrate that our approach is efficient and scalable to very large high-dimensional data sets.

机译：d维点集中的异常值挖掘是一项基础且经过充分研究的数据挖掘任务，原因是它的应用范围很广。大多数此类应用程序出现在高维领域。现有方法的瓶颈在于，在高维数据中，对距离或最近邻居概念的隐式或显式评估会变差。跟进Kriegel等人的工作。（KDD '08），我们研究了基于角度的离群因子在挖掘高维异常中的应用。尽管他们的算法以立方时间运行（具有二次时间启发式算法），但我们提出了一种新颖的基于随机投影的技术，该技术能够估计所有数据点在数据大小上近似线性的基于角度的离群因子。同样，我们的方法适合在并行环境中执行以实现并行加速。我们介绍了近似质量的理论分析，以确保我们的估算算法的可靠性。对合成和现实世界数据集进行的经验实验表明，我们的方法有效且可扩展至超大型高维数据集。

著录项

来源
《SIGKDD explorations》 |2012年第cdarom期|共9页
作者
Ninh Pham; Rasmus Pagh;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类 TP274.2;
关键词
Outlier detection; High-dimensional; Angle-based; Random projection; AMS Sketch;

机译：离群检测;高维;基于角度;随机投影;AMS草图;

相似文献

外文文献
中文文献
专利

1. A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data [J] . Ninh Pham, Rasmus Pagh SIGKDD explorations . 2012,第CDaROM期

机译：高角度数据中基于角度的离群值检测的近线性时间近似算法
2. An angle-based subspace anomaly detection approach to high-dimensional data: With an application to industrial fault detection [J] . Zhang Liangwei, Lin Jing, Karim Ramin Reliability Engineering & System Safety . 2015,第octa期

机译：高角度数据的基于角度的子空间异常检测方法：在工业故障检测中的应用
3. The Outlier Interval Detection Algorithms on Astronautical Time Series Data [J] . Wei Hu, Junpeng Bao Mathematical Problems in Engineering . 2013,第pta2期

机译：航天时间序列数据的异常值间隔检测算法
4. A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data [C] . Ninh Pham, Rasmus Pagh ACM SIGKDD international conference on knowledge discovery and dataMining . 2012

机译：高角度数据中基于角度的离群值检测的近线性时间近似算法
5. Towards outlier detection for high-dimensional data streams using projected outlier analysis strategy. [D] . Zhang, Ji. 2009

机译：使用投影离群值分析策略实现对高维数据流的离群值检测。
6. PCA leverage: outlier detection for high-dimensional functional magnetic resonance imaging data [O] . Amanda F. Mejia, Mary Beth Nebel, Ani Eloyan, -1

机译：PCA杠杆作用：高维功能磁共振成像数据的异常检测
7. Angle-based outlier detection in high-dimensional data [O] . Hans-peter Kriegel, Matthias Schubert, Arthur Zimek 2008

机译：高角度数据中基于角度的离群值检测

A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data

摘要

著录项

相似文献

相关主题

期刊订阅