【24h】

KNN Based Outlier Detection Algorithm in Large Dataset

机译:大数据集中基于KNN的离群值检测算法

获取原文

摘要

An outlier is the object which is very different from the rest of the dataset on some measure. Finding such exception has received much attention in the data mining field. In this paper, we propose a KNN based outlier detection algorithm which is consisted of two phases. Firstly, it partitions the dataset into several clusters and then in each cluster, it calculates the Kth nearest neighborhood for object to find outliers. In addition, the pruning scheme is used in our algorithm. It can effectively avoid frequent passing the entire dataset and unnecessary computations. Experimental results on both synthetic and real life datasets show that our algorithm is efficient for outlier detection in large dataset.
机译:离群值是在某种程度上与数据集其余部分完全不同的对象。查找此类异常已在数据挖掘领域引起了很多关注。在本文中,我们提出了一种基于KNN的离群值检测算法,该算法由两个阶段组成。首先,它将数据集划分为几个聚类,然后在每个聚类中,计算对象的第K个最近邻域以找到离群值。另外,在我们的算法中使用了修剪方案。它可以有效避免频繁传递整个数据集和不必要的计算。在合成数据集和现实数据集上的实验结果表明,我们的算法对于大型数据集的离群值检测是有效的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号