A relative patterns discovery for enhancing outlier detection in categorical data

Hao-Ting Pai; Fan Wu; Pei-Yun S. (Sabrina) Hsueh

首页> 外文期刊>Decision support systems >A relative patterns discovery for enhancing outlier detection in categorical data

【24h】

A relative patterns discovery for enhancing outlier detection in categorical data

机译：相对模式发现，可增强分类数据中的异常值检测

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Outlier (also known as anomaly) detection technology is widely applied to many areas, such as diagnosing diseases, evaluating credit, and investigating cybercrime. Recently, several studies, based on frequent itemset mining (FIM), have been proposed to detect outliers in categorical data. For efficiency, these FIM-based studies pruned (ignored) the majority of data by either imposing a threshold or restricting the length of the pattern or both, and they further adopted the limited information to evaluate observations. In spite of high efficiency, such a pruning approach encounters the problem of distortion, i.e., the accuracy decreases to a low level of discernment or even causes the contrary judgment in certain cases. In this paper, we introduce the concept relative patterns discovery from a new perspective on association analysis. To efficiently explore the relative patterns, we devise a hash-index-based intersecting approach (called the HA). Based on the knowledge of relative patterns, we propose an unsupervised approach (called the UA) to evaluate which observations are anomalous. Instead of using the limited information, our method can differentiate the features of observations without the problem of distortion. The results of the empirical investigation, conducted with eight real-world datasets on the UCI Machine Learning Repository, demonstrate that our method generally outperforms the previous studies not only in accuracy but also in efficiency. We also demonstrate that the execution complexity of our method is significantly efficient, especially in high-dimensional data. Furthermore, our method can represent a natural panorama of data, which is appropriate in controlled experiments for discovering more decisive factors in outlier detection.

机译：离群（也称为异常）检测技术已广泛应用于许多领域，例如诊断疾病，评估信誉和调查网络犯罪。最近，已提出了基于频繁项集挖掘（FIM）的一些研究来检测分类数据中的异常值。为了提高效率，这些基于FIM的研究通过施加阈值或限制模式的长度或两者兼有，对大多数数据进行了修剪（忽略），并且他们进一步采用了有限的信息来评估观察结果。尽管效率很高，但这种修剪方法仍会遇到失真的问题，即，在某些情况下，精度降低到较低的辨别水平，甚至导致相反的判断。在本文中，我们从关联分析的新角度介绍了概念相对模式发现。为了有效地探索相对模式，我们设计了一种基于哈希索引的相交方法（称为HA）。基于相对模式的知识，我们提出了一种无监督方法（称为UA）来评估哪些观测结果是异常的。代替使用有限的信息，我们的方法可以区分观测的特征而不会出现失真问题。对UCI机器学习存储库上的八个真实数据集进行的实证研究结果表明，我们的方法不仅在准确性上而且在效率上都优于以前的研究。我们还证明了我们方法的执行复杂性非常有效，尤其是在高维数据中。此外，我们的方法可以代表自然的数据全景，适用于受控实验，以发现异常检测中的更多决定性因素。

著录项

来源
《Decision support systems》 |2014年第11期|90-99|共10页
作者
Hao-Ting Pai; Fan Wu; Pei-Yun S. (Sabrina) Hsueh;
展开▼
作者单位

National Chung Cheng Univ., Dept. of Information Management, No. 168, Sec. 1, University Rd., Minhsiung, Chiayi 62102, Taiwan;

National Chung Cheng Univ., Dept. of Information Management, No. 168, Sec. 1, University Rd., Minhsiung, Chiayi 62102, Taiwan;

IBM Thomas J. Watson Research Center, 17 Skyline Drive, Hawthorne, NY 10532, United States;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Association analysis; Categorical data; Frequent itemsets mining; Outlier detection; Unsupervised method;

机译：关联分析;分类数据;频繁的项目集挖掘;离群值检测;无监督方法;

相似文献

外文文献
中文文献
专利

1. FAST-ODT: A Lightweight Outlier Detection Scheme for Categorical Data Sets [J] . Du Hongwei, Ye Qiang, Sun Zhipeng, Network Science and Engineering, IEEE Transactions on . 2021,第1期

机译：FAST-ODT：分类数据集的轻量级异常值检测方案
2. Outlier detection using PCA mix based T~2 control chart for continuous and categorical data [J] . Ahsan Muhammad, Mashuri Muhammad, Kuswanto Heri, Communications in Statistics . 2021,第5a6期

机译：基于PCA混合的T〜2控制图表的连续和分类数据的异常检测
3. A Neural Probabilistic outlier detection method for categorical data [J] . Cheng Li, Wang Yijie, Ma Xingkong Neurocomputing . 2019,第Nova6期

机译：分类数据的神经概率离群值检测方法
4. Enhancing Effectiveness of Outlier Detections for Low Density Patterns [C] . Jian Tang, Zhixiang Chen, Ada Wai-chee Fu, Advances in Knowledge Discovery and Data Mining . 2002

机译：增强低密度模式离群值检测的有效性
5. Relational discovery in sequentially-connected data streams: Efficient algorithms for lossless pattern discovery and change detection. [D] . Coble, Jeffrey Allen. 2005

机译：顺序连接的数据流中的关系发现：用于无损模式发现和更改检测的高效算法。
6. A knowledge discovery methodology from EEG data for cyclic alternating pattern detection [O] . Fátima Machado, Francisco Sales, Clara Santos, 2018

机译：来自EEG数据的知识发现方法用于循环交替模式检测
7. A Semi-Supervised Approach to the Detection and Characterization of Outliers in Categorical Data [O] . Ienco, Dino, Pensa, Ruggero, Meo, Rosa 2016

机译：分类数据中异常值的检测和表征的半监督方法

A relative patterns discovery for enhancing outlier detection in categorical data

摘要

著录项

相似文献

相关主题

期刊订阅