An Efficient Outlier Mining Algorithm for Large Dataset

机译：大数据集的一种有效的离群值挖掘算法

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Since an outlier often contains useful information, outlier detection is becoming a hot issue in data mining. Thus, an efficient outlier mining algorithm based on KNN is proposed in this paper. It can find outlier more accurately through defining a correlation matrix considering the importance and correlation between attributes. In addition, a data structure R-tree is used in the algorithm and it utilizes pruning scheme to drastically reduce the time consuming of computing. Experimental results show that our algorithm is more efficient than the traditional KNN algorithm. It will provide an effective solution for outlier mining in large dataset.

机译：由于异常值通常包含有用的信息，因此异常值检测已成为数据挖掘中的热门问题。因此，本文提出了一种基于KNN的高效离群挖掘算法。通过定义考虑属性之间重要性和相关性的相关矩阵，可以更精确地找到异常值。另外，该算法中使用了数据结构R树，并利用修剪方案大大减少了计算时间。实验结果表明，该算法比传统的KNN算法更有效。这将为大型数据集中的异常挖掘提供有效的解决方案。

著录项

来源
《Information Management, Innovation Management and Industrial Engineering, ICIII, 2008 International Conference on》||P.199-202|共4页
会议地点
作者
Yang Peng; Huang Biao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类工业技术;
关键词

相似文献

外文文献
中文文献
专利

1. An Efficient Model by Applying Genetic Algorithms for Outlier Detection in Classifying Medical Datasets [J] . T Santhanam, M.S. Padmavathi Australian Journal of Basic and Applied Sciences . 2015,第2015期

机译：应用遗传算法对医学数据集进行离群值检测的有效模型
2. An Efficient Algorithm for Distributed Outlier Detection in Large Multi-Dimensional Datasets [J] . Xi-Te Wang, De-Rong Shen, Mei Bai, 计算机科学技术学报（英文版） . 2015,第006期

机译：大型多维数据集中分布式离群值检测的高效算法
3. Thresholds based outlier detection approach for mining class outliers:An empirical case study on software measurement datasets [J] . Oral Alan, Cagatay Catal Expert Systems with Application . 2011,第4期

机译：基于阈值的挖掘类离群值检测方法：基于软件测量数据集的经验案例研究
4. An Efficient Outlier Mining Algorithm for Large Dataset [C] . Yang Peng, Huang Biao International Conference on Information Management, Innovation Management and Industrial Engineering . 2008

机译：大型数据集的高效异常挖掘算法
5. Efficient Layouts and Algorithms for Managing Versioned Datasets [D] . Bhattacherjee, Souvik. 2018

机译：用于管理版本化数据集的高效布局和算法
6. Empirical study of seven data mining algorithms on different characteristics of datasets for biomedical classification applications [O] . Yiyan Zhang, Yi Xin, Qin Li, 2017

机译：七种数据挖掘算法在生物医学分类应用中不同数据集特征的实证研究
7. Efficient Algorithms for Mining Outliers from Large Data Sets [O] . Sridhar Ramaswamy, Rajeev Rastogi, Kyuseok Shim, 100

机译：从大数据集挖掘异常值的有效算法
8. Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets [R] . Madduri, K., Ediger, D., Jiang, K., 2008

机译：更快的并行算法和高效的多线程实现，用于评估海量数据集的中介中心性

An Efficient Outlier Mining Algorithm for Large Dataset

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅