Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles

机译：具有近似最近邻集合的快速可扩展的异常值检测

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Popular outlier detection methods require the pairwise comparison of objects to compute the nearest neighbors. This inherently quadratic problem is not scalable to large data sets, making multidimensional outlier detection for big data still an open challenge. Existing approximate neighbor search methods are designed to preserve distances as well as possible. In this article, we present a highly scalable approach to compute the nearest neighbors of objects that instead focuses on preserving neighborhoods well using an ensemble of space-filling curves. We show that the method has near-linear complexity, can be distributed to clusters for computation, and preserves neighborhoods-but not distances-better than established methods such as locality sensitive hashing and projection indexed nearest neighbors. Furthermore, we demonstrate that, by preserving neighborhoods, the quality of outlier detection based on local density estimates is not only well retained but sometimes even improved, an effect that can be explained by relating our method to outlier detection ensembles. At the same time, the outlier detection process is accelerated by two orders of magnitude.

机译：流行的离群值检测方法要求对象的成对比较以计算最近的邻居。这个固有的二次问题无法扩展到大数据集，这使得对大数据的多维离群值检测仍然是一个挑战。现有的近似邻居搜索方法被设计为尽可能地保持距离。在本文中，我们提出了一种高度可扩展的方法来计算对象的最近邻居，而该方法着重于使用一组空间填充曲线很好地保存邻域。我们表明，该方法具有近乎线性的复杂度，可以分布到群集中进行计算，并且比邻域敏感散列和投影索引的最近邻域等已建立的方法更好地保留了邻域，但没有保留距离。此外，我们证明，通过保护邻域，不仅可以很好地保留基于局部密度估计的异常值检测质量，而且有时甚至可以提高质量，这种效果可以通过将我们的方法与异常值检测集合相关联来解释。同时，离群值检测过程加快了两个数量级。

著录项

来源
《International conference on database systems for advanced applications;International workshop on Semantic computing and personalization;International workshop on big data management and service》|2015年|19-36|共18页
会议地点
作者
Erich Schubert; Arthur Zimek; Hans-Peter Kriegel;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A dynamic ensemble outlier detection model based on an adaptive k-nearest neighbor rule [J] . Wang Biao, Mao Zhizhong Information Fusion . 2020,第1期

机译：一种基于自适应k - 最近邻居规则的动态集合异常检测模型
2. A Novel Approach to Outlier Detection using Modified Grey Wolf Optimization and k-Nearest Neighbors Algorithm [J] . Reema Aswani, S. P. Ghrera, Satish Chandra Indian Journal of Science and Technology . 2016,第44期

机译：基于改进的灰狼优化和k最近邻算法的离群值检测新方法
3. Reverse Nearest Neighbors in Unsupervised Distance-Based Outlier Detection [J] . Radovanovic Milos, Nanopoulos Alexandros, Ivanovic Mirjana Knowledge and Data Engineering, IEEE Transactions on . 2015,第5期

机译：无监督基于距离的离群值检测中的反向最近邻居
4. Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles [C] . Erich Schubert, Arthur Zimek, Hans-Peter Kriegel International Conference on Database Systems for Advanced Applications . 2015

机译：具有近似最近邻合线的快速和可扩展的异常检测
5. Fast Locality Sensitive Hashing Algorithm for Approximate Nearest Neighbor Search: A Practical Data Mining Approach. [D] . Buaba, Ruben. 2012

机译：近似最近邻居搜索的快速局部敏感哈希算法：一种实用的数据挖掘方法。
6. Fast open modification spectral library searching through approximate nearest neighbor indexing [O] . Wout Bittremieux, Pieter Meysman, William Stafford Noble, -1

机译：通过近似最近邻居索引快速开放修改谱库搜索
7. Efficient Outlier Detection for High Dimensional Data using Improved Monarch Butterfly Optimization and Mutual Nearest Neighbors Algorithm: IMBO-MNN [O] . M Batchanaboyina, Nagaraju Devarakonda 2020

机译：高尺寸数据使用改进的Monarch蝶优化和相互最近邻居算法的高效异常检测：IMBO-MNN

Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles

摘要

著录项

相似文献

相关主题

期刊订阅