首页> 外文会议>International conference on database systems for advanced applications >Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles
【24h】

Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles

机译:具有近似最近邻合线的快速和可扩展的异常检测

获取原文

摘要

Popular outlier detection methods require the pairwise comparison of objects to compute the nearest neighbors. This inherently quadratic problem is not scalable to large data sets, making multidimensional outlier detection for big data still an open challenge. Existing approximate neighbor search methods are designed to preserve distances as well as possible. In this article, we present a highly scalable approach to compute the nearest neighbors of objects that instead focuses on preserving neighborhoods well using an ensemble of space-filling curves. We show that the method has near-linear complexity, can be distributed to clusters for computation, and preserves neighborhoods-but not distances-better than established methods such as locality sensitive hashing and projection indexed nearest neighbors. Furthermore, we demonstrate that, by preserving neighborhoods, the quality of outlier detection based on local density estimates is not only well retained but sometimes even improved, an effect that can be explained by relating our method to outlier detection ensembles. At the same time, the outlier detection process is accelerated by two orders of magnitude.
机译:受欢迎的异常值检测方法需要对象的成对比较来计算最近的邻居。这种固有的二次问题对大数据集不可扩展,使得大数据的多维异常检测仍然是开放的挑战。现有的近似邻居搜索方法被设计为保持距离和可能。在本文中,我们介绍了一种高度可扩展的方法来计算对象的最近邻居,而是使用空间填充曲线的集合来融合邻域。我们表明该方法具有近线性复杂性,可以分发到计算的集群,并保留邻域 - 但不距离 - 比诸如地方敏感散列和投影索引的最近邻居的建立方法更好。此外,我们证明,通过保留邻域,基于本地密度估计的异常检测质量不仅保留很好,而且有时甚至改善,可以通过将我们的方法与异常值检测合并相关来解释的效果。同时,异常值检测过程加速了两个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号