An algorithm for mining outliers in categorical data through ranking

机译：通过排序挖掘分类数据中离群值的算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.

机译：数据挖掘领域的快速发展导致了各种离群值检测方法的发展。尽管在数值数据的背景下已经很好地探索了离群值的检测方法，但分类数据的处理仍在不断发展。在本文中，我们提出了一种基于异常值的新颖定义的两阶段算法，用于在分类数据中检测异常值。在第一阶段，此算法探索给定数据的聚类，然后是排名阶段以确定最可能的异常值集。该算法有望能够更好地执行，因为它可以识别不同类型的离群值，并基于给定数据中的属性值频率和固有聚类结构采用两个独立的排名方案。与某些现有方法不同，此算法的计算复杂度不受要检测的异常值数量的影响。通过对各种公共领域分类数据集进行实验，证明了该算法的有效性。

著录项

来源
《2012 12th International Conference on Hybrid Intelligent Systems.》|2012年|p.247-252|共6页
会议地点 Pune(IN);Pune(IN)
作者
Suri N N R Ranga; Murty M Narasimha; Athithan G;
展开▼
作者单位

Centre for AI and Robotics, Bangalore, India;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类人工智能理论;人工智能理论;
关键词
Categorical data; Data clustering; Data mining; Outlier detection;

机译：分类数据;数据聚类;数据挖掘;离群值检测;;

相似文献

外文文献
中文文献
专利

1. A ranking-based algorithm for detection of outliers in categorical data [J] . N.N.R. Ranga Suri, M. Narasimha Murty, G. Athithan International Journal of Hybrid Intelligent Systems . 2014,第1期

机译：基于分类的分类数据离群值检测算法
2. A hybrid algorithm for mining local outliers in categorical data [J] . Liu Meiling, Huang Mingxuan, Tang Weidong International journal of wireless and mobile computing . 2017,第1期

机译：一种用于在分类数据中挖掘局部离群值的混合算法
3. Mining multidimensional contextual outliers from categorical relational data [J] . Tang Guanting, Pei Jian, Bailey James, Intelligent data analysis . 2015,第5期

机译：从分类关系数据中挖掘多维上下文离群值
4. An algorithm for mining outliers in categorical data through ranking [C] . Suri N N R Ranga, Murty M Narasimha, Athithan G International Conference on Hybrid Intelligent Systems . 2012

机译：通过排名进行分类数据中的挖掘异化算法
5. Improved variable and value ranking techniques for mining categorical data. [D] . Wang, Huanjing. 2005

机译：改进的变量和值排序技术，用于挖掘分类数据。
6. Designing a Streaming Algorithm for Outlier Detection in Data Mining—An Incremental Approach [O] . Kangqing Yu, Wei Shi, Nicola Santoro 2020

机译：设计用于数据挖掘中异常值检测的流算法—一种增量方法
7. Mining Multidimensional Contextual Outliers from Categorical Relational Data [O] . Guanting Tang, Jian Pei, James Bailey, 2016

机译：从分类关系数据挖掘多维上下文异常值

An algorithm for mining outliers in categorical data through ranking

摘要

著录项

相似文献

相关主题

期刊订阅