首页> 外文会议>IEEE Joint International Information Technology and Artificial Intelligence Conference >Outlier Detection Method based on Improved Two-step Clustering Algorithm and Synthetic Hypothesis Testing
【24h】

Outlier Detection Method based on Improved Two-step Clustering Algorithm and Synthetic Hypothesis Testing

机译:基于改进的两步聚类算法和综合假设检验的异常值检测方法

获取原文

摘要

For the detection of outliers in unsupervised mixed attribute datasets, this paper proposes an outlier detection method based on improved two-step clustering algorithm and synthetic hypothesis testing. Firstly, a two-step clustering algorithm based on clustering feature tree is improved to be suitable for mixed attribute datasets. In the clustering stage, RA-Clust algorithm is used to pre-select the clustering center to improve the algorithm performance, and the original complex data distribution is simplified into the superposition of data distribution under sub-clusterings. Then using the method of synthetic hypothesis testing, the outliers are judged according to the change rule of data in the sub-clusterings of sample data. Experimental results show that the algorithm has better clustering performance, and the synthetic hypothesis testing can accurately detect most of the outliers and maintain over 90% of the outlier detect efficiency under different data sizes.
机译:为了检测非监督混合属性数据集中的离群值,本文提出了一种基于改进的两步聚类算法和综合假设检验的离群值检测方法。首先,对基于聚类特征树的两步聚类算法进行了改进,使其适用于混合属性数据集。在聚类阶段,使用RA-Clust算法预先选择聚类中心以提高算法性能,并将原始的复杂数据分布简化为子聚类下数据分布的叠加。然后采用综合假设检验的方法,根据样本数据子类中数据的变化规律判断离群值。实验结果表明,该算法具有较好的聚类性能,综合假设检验可以准确地检测出大多数离群值,并在不同数据量下保持了超过90%的离群值检测效率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号