首页> 外文期刊>Decision support systems >A relative patterns discovery for enhancing outlier detection in categorical data
【24h】

A relative patterns discovery for enhancing outlier detection in categorical data

机译:相对模式发现,可增强分类数据中的异常值检测

获取原文
获取原文并翻译 | 示例
       

摘要

Outlier (also known as anomaly) detection technology is widely applied to many areas, such as diagnosing diseases, evaluating credit, and investigating cybercrime. Recently, several studies, based on frequent itemset mining (FIM), have been proposed to detect outliers in categorical data. For efficiency, these FIM-based studies pruned (ignored) the majority of data by either imposing a threshold or restricting the length of the pattern or both, and they further adopted the limited information to evaluate observations. In spite of high efficiency, such a pruning approach encounters the problem of distortion, i.e., the accuracy decreases to a low level of discernment or even causes the contrary judgment in certain cases. In this paper, we introduce the concept relative patterns discovery from a new perspective on association analysis. To efficiently explore the relative patterns, we devise a hash-index-based intersecting approach (called the HA). Based on the knowledge of relative patterns, we propose an unsupervised approach (called the UA) to evaluate which observations are anomalous. Instead of using the limited information, our method can differentiate the features of observations without the problem of distortion. The results of the empirical investigation, conducted with eight real-world datasets on the UCI Machine Learning Repository, demonstrate that our method generally outperforms the previous studies not only in accuracy but also in efficiency. We also demonstrate that the execution complexity of our method is significantly efficient, especially in high-dimensional data. Furthermore, our method can represent a natural panorama of data, which is appropriate in controlled experiments for discovering more decisive factors in outlier detection.
机译:离群(也称为异常)检测技术已广泛应用于许多领域,例如诊断疾病,评估信誉和调查网络犯罪。最近,已提出了基于频繁项集挖掘(FIM)的一些研究来检测分类数据中的异常值。为了提高效率,这些基于FIM的研究通过施加阈值或限制模式的长度或两者兼有,对大多数数据进行了修剪(忽略),并且他们进一步采用了有限的信息来评估观察结果。尽管效率很高,但这种修剪方法仍会遇到失真的问题,即,在某些情况下,精度降低到较低的辨别水平,甚至导致相反的判断。在本文中,我们从关联分析的新角度介绍了概念相对模式发现。为了有效地探索相对模式,我们设计了一种基于哈希索引的相交方法(称为HA)。基于相对模式的知识,我们提出了一种无监督方法(称为UA)来评估哪些观测结果是异常的。代替使用有限的信息,我们的方法可以区分观测的特征而不会出现失真问题。对UCI机器学习存储库上的八个真实数据集进行的实证研究结果表明,我们的方法不仅在准确性上而且在效率上都优于以前的研究。我们还证明了我们方法的执行复杂性非常有效,尤其是在高维数据中。此外,我们的方法可以代表自然的数据全景,适用于受控实验,以发现异常检测中的更多决定性因素。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号