Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles

机译：具有近似最近邻合线的快速和可扩展的异常检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Popular outlier detection methods require the pairwise comparison of objects to compute the nearest neighbors. This inherently quadratic problem is not scalable to large data sets, making multidimensional outlier detection for big data still an open challenge. Existing approximate neighbor search methods are designed to preserve distances as well as possible. In this article, we present a highly scalable approach to compute the nearest neighbors of objects that instead focuses on preserving neighborhoods well using an ensemble of space-filling curves. We show that the method has near-linear complexity, can be distributed to clusters for computation, and preserves neighborhoods-but not distances-better than established methods such as locality sensitive hashing and projection indexed nearest neighbors. Furthermore, we demonstrate that, by preserving neighborhoods, the quality of outlier detection based on local density estimates is not only well retained but sometimes even improved, an effect that can be explained by relating our method to outlier detection ensembles. At the same time, the outlier detection process is accelerated by two orders of magnitude.

机译：受欢迎的异常值检测方法需要对象的成对比较来计算最近的邻居。这种固有的二次问题对大数据集不可扩展，使得大数据的多维异常检测仍然是开放的挑战。现有的近似邻居搜索方法被设计为保持距离和可能。在本文中，我们介绍了一种高度可扩展的方法来计算对象的最近邻居，而是使用空间填充曲线的集合来融合邻域。我们表明该方法具有近线性复杂性，可以分发到计算的集群，并保留邻域 - 但不距离 - 比诸如地方敏感散列和投影索引的最近邻居的建立方法更好。此外，我们证明，通过保留邻域，基于本地密度估计的异常检测质量不仅保留很好，而且有时甚至改善，可以通过将我们的方法与异常值检测合并相关来解释的效果。同时，异常值检测过程加速了两个数量级。

著录项

来源
《International conference on database systems for advanced applications》|2015年||共18页
会议地点
作者
Erich Schubert; Arthur Zimek; Hans-Peter Kriegel;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词

相似文献

外文文献
中文文献
专利

1. A dynamic ensemble outlier detection model based on an adaptive k-nearest neighbor rule [J] . Wang Biao, Mao Zhizhong Information Fusion . 2020,第1期

机译：一种基于自适应k - 最近邻居规则的动态集合异常检测模型
2. A Novel Approach to Outlier Detection using Modified Grey Wolf Optimization and k-Nearest Neighbors Algorithm [J] . Reema Aswani, S. P. Ghrera, Satish Chandra Indian Journal of Science and Technology . 2016,第44期

机译：基于改进的灰狼优化和k最近邻算法的离群值检测新方法
3. Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection [J] . Radovanovic Milos, Nanopoulos Alexandros, Ivanovic Mirjana Knowledge and Data Engineering, IEEE Transactions on . 2015,第5期

机译：无监督基于距离的离群值检测中的反向最近邻居
4. Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles [C] . Erich Schubert, Arthur Zimek, Hans-Peter Kriegel International conference on database systems for advanced applications;International workshop on Semantic computing and personalization;International workshop on big data management and service . 2015

机译：具有近似最近邻集合的快速可扩展的异常值检测
5. Fast Locality Sensitive Hashing Algorithm for Approximate Nearest Neighbor Search: A Practical Data Mining Approach. [D] . Buaba, Ruben. 2012

机译：近似最近邻居搜索的快速局部敏感哈希算法：一种实用的数据挖掘方法。
6. Fast open modification spectral library searching through approximate nearest neighbor indexing [O] . Wout Bittremieux, Pieter Meysman, William Stafford Noble, -1

机译：通过近似最近邻居索引快速开放修改谱库搜索
7. Efficient Outlier Detection for High Dimensional Data using Improved Monarch Butterfly Optimization and Mutual Nearest Neighbors Algorithm: IMBO-MNN [O] . M Batchanaboyina, Nagaraju Devarakonda 2020

机译：高尺寸数据使用改进的Monarch蝶优化和相互最近邻居算法的高效异常检测：IMBO-MNN

Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles

摘要

著录项

相似文献

相关主题

期刊订阅