【24h】

Fast and Scalable Outlier Detection with Approximate Nearest Neighbor Ensembles

机译:具有近似最近邻集合的快速可扩展的异常值检测

获取原文

摘要

Popular outlier detection methods require the pairwise comparison of objects to compute the nearest neighbors. This inherently quadratic problem is not scalable to large data sets, making multidimensional outlier detection for big data still an open challenge. Existing approximate neighbor search methods are designed to preserve distances as well as possible. In this article, we present a highly scalable approach to compute the nearest neighbors of objects that instead focuses on preserving neighborhoods well using an ensemble of space-filling curves. We show that the method has near-linear complexity, can be distributed to clusters for computation, and preserves neighborhoods-but not distances-better than established methods such as locality sensitive hashing and projection indexed nearest neighbors. Furthermore, we demonstrate that, by preserving neighborhoods, the quality of outlier detection based on local density estimates is not only well retained but sometimes even improved, an effect that can be explained by relating our method to outlier detection ensembles. At the same time, the outlier detection process is accelerated by two orders of magnitude.
机译:流行的离群值检测方法要求对象的成对比较以计算最近的邻居。这个固有的二次问题无法扩展到大数据集,这使得对大数据的多维离群值检测仍然是一个挑战。现有的近似邻居搜索方法被设计为尽可能地保持距离。在本文中,我们提出了一种高度可扩展的方法来计算对象的最近邻居,而该方法着重于使用一组空间填充曲线很好地保存邻域。我们表明,该方法具有近乎线性的复杂度,可以分布到群集中进行计算,并且比邻域敏感散列和投影索引的最近邻域等已建立的方法更好地保留了邻域,但没有保留距离。此外,我们证明,通过保护邻域,不仅可以很好地保留基于局部密度估计的异常值检测质量,而且有时甚至可以提高质量,这种效果可以通过将我们的方法与异常值检测集合相关联来解释。同时,离群值检测过程加快了两个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号