首页> 外文会议>IEEE International Conference on Advanced Information Networking and Applications >Comparison between Classifier's Accuracies Based on Different Outlier Methods Generated by Frequent and Infrequent Categorical Data
【24h】

Comparison between Classifier's Accuracies Based on Different Outlier Methods Generated by Frequent and Infrequent Categorical Data

机译:基于频繁和不常见的分类数据生成的不同异常方法的分类器准确性的比较

获取原文

摘要

Outlier analysis is an essential task in data science to find out inconsistencies in data, to build a good classifier and in decision making. Finding outliers from categorical data is a tough task. In this work, a comparative study is made between classifier accuracies which are built by different outlier analysis methods generated by frequent and infrequent itemsets from categorical data. In modeling a classifier for categorical data, high frequent records are most useful and the infrequent records are obstacles in modeling the classifiers. The experiments are done on Bank dataset and Nursery dataset, taken from UCI ML Repository to compare the available methods with the proposed method. For normally distributed OFI, the number of outliers to be eliminated need not be given as input since it generates the number of outliers automatically. However the threshold value is needed to be given to generate infrequent item sets for NOFI.
机译:异常值分析是数据科学中的重要任务,以找出数据中的不一致性,建立一个良好的分类器和决策。从分类数据中查找异常值是一项艰巨的任务。在这项工作中,对比较研究是在分类器精度之间进行的,这些准确性由不同的异常分析方法构建,由频繁和不经常从分类数据中集成的项目集产生。在为分类数据建模分类器中,高频率记录最有用,不经常的记录是模拟分类器的障碍。实验是在银行数据集和托儿所数据集中完成的,从UCI M1存储库中获取,以将可用方法与所提出的方法进行比较。对于正常分布的OFI,未删除的异常值的数量不需要作为输入给出,因为它会自动生成异常值。然而,需要阈值来为NOFI生成不经常的项目集。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号