首页> 外文会议>Advances in Knowledge Discovery and Data Mining >Cell-Based Outlier Detection Algorithm: A Fast Outlier Detection Algorithm for Large Datasets
【24h】

Cell-Based Outlier Detection Algorithm: A Fast Outlier Detection Algorithm for Large Datasets

机译:基于单元的离群值检测算法:大数据集的快速离群值检测算法

获取原文
获取原文并翻译 | 示例

摘要

Finding outliers is an important task for many KDD applications. We developed a cell-based outlier detection algorithm (short for CEBOD) to detect outliers in large dataset. The algorithm is based on LOF; major difference is CEBOD can avoid large computations on the majority part of dataset by filter the initial dataset. Our experiment shows that CEBOD is more efferent than LOF, and can find outliers in large datasets fast and accurately. A large dataset is loaded into memory by blocks, and the data are placed into appropriate cells based on their values. Each cell holds a certain number of data, which represents the cell's density. Data locate in high density cells and have no nearness relationship with local outlier factor calculation are filtered. And we record these cells' density for the next block of data fill in. The final calculation will be done on those data in low density cells. In this way, we can handle a large data-set which can't be loaded into memory once, improving the algorithm's efficiency by reducing many useless computations. The time complexity of CEBOD is O(N).
机译:查找离群值是许多KDD应用程序的重要任务。我们开发了一种基于单元的离群值检测算法(CEBOD的缩写),用于检测大型数据集中的离群值。该算法基于LOF;主要区别在于CEBOD可以通过过滤初始数据集来避免对大部分数据集进行大量计算。我们的实验表明,CEBOD比LOF更为传出,并且可以快速而准确地在大型数据集中找到异常值。大型数据集按块加载到内存中,然后根据其值将数据放入适当的单元格中。每个单元格保存一定数量的数据,这些数据代表单元格的密度。数据位于高密度像元中,并且与局部离群因素计算没有亲缘关系的数据被过滤。然后,我们记录下一个数据块中这些单元的密度。最终计算将在低密度单元中对这些数据进行。这样,我们可以处理无法一次加载到内存中的大型数据集,从而通过减少许多无用的计算来提高算法的效率。 CEBOD的时间复杂度为O(N)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号