Cell-Based Outlier Detection Algorithm: A Fast Outlier Detection Algorithm for Large Datasets

机译：基于单元的离群值检测算法：大数据集的快速离群值检测算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Finding outliers is an important task for many KDD applications. We developed a cell-based outlier detection algorithm (short for CEBOD) to detect outliers in large dataset. The algorithm is based on LOF; major difference is CEBOD can avoid large computations on the majority part of dataset by filter the initial dataset. Our experiment shows that CEBOD is more efferent than LOF, and can find outliers in large datasets fast and accurately. A large dataset is loaded into memory by blocks, and the data are placed into appropriate cells based on their values. Each cell holds a certain number of data, which represents the cell's density. Data locate in high density cells and have no nearness relationship with local outlier factor calculation are filtered. And we record these cells' density for the next block of data fill in. The final calculation will be done on those data in low density cells. In this way, we can handle a large data-set which can't be loaded into memory once, improving the algorithm's efficiency by reducing many useless computations. The time complexity of CEBOD is O(N).

机译：查找离群值是许多KDD应用程序的重要任务。我们开发了一种基于单元的离群值检测算法（CEBOD的缩写），用于检测大型数据集中的离群值。该算法基于LOF；主要区别在于CEBOD可以通过过滤初始数据集来避免对大部分数据集进行大量计算。我们的实验表明，CEBOD比LOF更为传出，并且可以快速而准确地在大型数据集中找到异常值。大型数据集按块加载到内存中，然后根据其值将数据放入适当的单元格中。每个单元格保存一定数量的数据，这些数据代表单元格的密度。数据位于高密度像元中，并且与局部离群因素计算没有亲缘关系的数据被过滤。然后，我们记录下一个数据块中这些单元的密度。最终计算将在低密度单元中对这些数据进行。这样，我们可以处理无法一次加载到内存中的大型数据集，从而通过减少许多无用的计算来提高算法的效率。 CEBOD的时间复杂度为O（N）。

著录项

来源
《Advances in Knowledge Discovery and Data Mining》|2008年|P.1042-1048|共7页
会议地点 Osaka(JP);Osaka(JP)
作者
You Wan; Fuling Bian;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类 TP311.13;
关键词
outlier detection; cell density filtering; large datasets;

机译：异常检测;细胞密度过滤;大数据集;

相似文献

外文文献
中文文献
专利

1. Supervised Machine Learning and Heuristic Algorithms for Outlier Detection in Irregular Spatiotemporal Datasets [J] . Chowdhury K. P. Journal of environment informatics . 2019,第1期

机译：不规则时空数据集中的监督机器学习和启发式算法
2. An Efficient Model by Applying Genetic Algorithms for Outlier Detection in Classifying Medical Datasets [J] . T Santhanam, M.S. Padmavathi Australian Journal of Basic and Applied Sciences . 2015,第2015期

机译：应用遗传算法对医学数据集进行离群值检测的有效模型
3. Identification of Outliers in Oxazolines and Oxazoles High Dimension Molecular Descriptor Dataset Using Principal Component Outlier Detection Algorithm and Comparative Numerical Study of Other Robust Estimators [J] . Doreswamy, Chanabasayya .M. Vastrad International Journal of Data Mining & Knowledge Management Process . 2013,第4期

机译：使用主成分离群值检测算法和其他鲁棒估计量的比较数值研究，确定恶唑啉和恶唑高维分子描述符数据集中的离群值
4. Cell-Based Outlier Detection Algorithm: A Fast Outlier Detection Algorithm for Large Datasets [C] . You Wan, Fuling Bian Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining . 2008

机译：基于单元的异常检测算法：大型数据集的快速异常检测算法
5. Semi-Supervised Outlier Detection Algorithms [D] . Tun, Jason Sopheap. 2018

机译：半监督异常检测算法
6. Fast Outlier Detection Using a Grid-Based Algorithm [O] . Jihwan Lee, Nam-Wook Cho -1

机译：使用基于网格的算法快速检测异常值
7. CURIO: A fast outlier and outlier clusterud detection algorithm for large datasets [O] . Ceglar Aaron John, Roddick John Francis, Powers David Martin 2007

机译：CURIO：快速的离群值和离群值 ud 大型数据集的检测算法

Cell-Based Outlier Detection Algorithm: A Fast Outlier Detection Algorithm for Large Datasets

摘要

著录项

相似文献

相关主题

期刊订阅