...
首页> 外文期刊>SIGKDD explorations >A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data
【24h】

A Near-linear Time Approximation Algorithm for Angle-based Outlier Detection in High-dimensional Data

机译:高角度数据中基于角度的离群值检测的近线性时间近似算法

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Outlier mining in d-dimensional point sets is a fundamental and well studied data mining task due to its variety of applications. Most such applications arise in high-dimensional domains. A bottleneck of existing approaches is that implicit or explicit assessments on concepts of distance or nearest neighbor are deteriorated in high-dimensional data. Following up on the work of Kriegel et al. (KDD '08), we investigate the use of angle-based outlier factor in mining high-dimensional outliers. While their algorithm runs in cubic time (with a quadratic time heuristic), we propose a novel random projection-based technique that is able to estimate the angle-based outlier factor for all data points in time near-linear in the size of the data. Also, our approach is suitable to be performed in parallel environment to achieve a parallel speedup. We introduce a theoretical analysis of the quality of approximation to guarantee the reliability of our estimation algorithm. The empirical experiments on synthetic and real world data sets demonstrate that our approach is efficient and scalable to very large high-dimensional data sets.
机译:d维点集中的异常值挖掘是一项基础且经过充分研究的数据挖掘任务,原因是它的应用范围很广。大多数此类应用程序出现在高维领域。现有方法的瓶颈在于,在高维数据中,对距离或最近邻居概念的隐式或显式评估会变差。跟进Kriegel等人的工作。 (KDD '08),我们研究了基于角度的离群因子在挖掘高维异常中的应用。尽管他们的算法以立方时间运行(具有二次时间启发式算法),但我们提出了一种新颖的基于随机投影的技术,该技术能够估计所有数据点在数据大小上近似线性的基于角度的离群因子。同样,我们的方法适合在并行环境中执行以实现并行加速。我们介绍了近似质量的理论分析,以确保我们的估算算法的可靠性。对合成和现实世界数据集进行的经验实验表明,我们的方法有效且可扩展至超大型高维数据集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号