首页> 外文会议>2012 12th International Conference on Hybrid Intelligent Systems. >An algorithm for mining outliers in categorical data through ranking
【24h】

An algorithm for mining outliers in categorical data through ranking

机译:通过排序挖掘分类数据中离群值的算法

获取原文
获取原文并翻译 | 示例

摘要

The rapid growth in the field of data mining has lead to the development of various methods for outlier detection. Though detection of outliers has been well explored in the context of numerical data, dealing with categorical data is still evolving. In this paper, we propose a two-phase algorithm for detecting outliers in categorical data based on a novel definition of outliers. In the first phase, this algorithm explores a clustering of the given data, followed by the ranking phase for determining the set of most likely outliers. The proposed algorithm is expected to perform better as it can identify different types of outliers, employing two independent ranking schemes based on the attribute value frequencies and the inherent clustering structure in the given data. Unlike some existing methods, the computational complexity of this algorithm is not affected by the number of outliers to be detected. The efficacy of this algorithm is demonstrated through experiments on various public domain categorical data sets.
机译:数据挖掘领域的快速发展导致了各种离群值检测方法的发展。尽管在数值数据的背景下已经很好地探索了离群值的检测方法,但分类数据的处理仍在不断发展。在本文中,我们提出了一种基于异常值的新颖定义的两阶段算法,用于在分类数据中检测异常值。在第一阶段,此算法探索给定数据的聚类,然后是排名阶段以确定最可能的异常值集。该算法有望能够更好地执行,因为它可以识别不同类型的离群值,并基于给定数据中的属性值频率和固有聚类结构采用两个独立的排名方案。与某些现有方法不同,此算法的计算复杂度不受要检测的异常值数量的影响。通过对各种公共领域分类数据集进行实验,证明了该算法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号