Discretization as the enabling technique for the Naieve Bayes and semi-Naieve Bayes-based classification

MARCIN J. MIZIANTY; LUKASZ A. KURGAN; MAREK R. OGIELA

首页> 外文期刊>The Knowledge Engineering Review >Discretization as the enabling technique for the Naieve Bayes and semi-Naieve Bayes-based classification

【24h】

Discretization as the enabling technique for the Naieve Bayes and semi-Naieve Bayes-based classification

机译：离散化作为朴素贝叶斯和基于半朴素贝叶斯分类的支持技术

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Current classification problems that concern data sets of large and increasing size require scalable classification algorithms. In this study, we concentrate on several scalable, linear complexity classifiers that include one of the top 10 voted data mining methods, Naive Bayes (NB), and several recently proposed semi-NB classifiers. These algorithms perform front-end discretization of the continuous features since by design they work only with nominal or discrete features. We address the lack of studies that investigate the benefits and drawbacks of discretization in the context of the subsequent classification. Our comprehensive empirical study considers 12 dis-cretizers (two unsupervised and 10 supervised), seven classifiers (two classical NB and five semi-NB), and 16 data sets. We investigate the scalability of the discretizers and show that the fastest supervised discretizers fast class-attribute interdependency maximization (FCAIM), class-attribute interdependency maximization (CAIM), and information entropy maximization (1EM) provide discretization schemes with the highest overall quality. We show that discretization improves the classification accuracy when compared against the two classical methods, NB and Flexible Naive Bayes (FNB), executed on the raw data. The choice of the discretization algorithm impacts the significance of the improvements. The MODL, FCAIM, and CAIM methods provide statistically significant improvements, while the IEM, Class-attribute contingency coefficient (CACC), and Khiops discretizers provide moderate improvements. The most accurate classification models are generated by the Averaged one-dependence estimators (AODEsr) classifier followed by AODE and HNB (Hidden Naive Bayes). AODEsr run on data discretized with MODL, FCAIM, and CAIM provides statistically significantly better accuracies than both the classical NB methods. The worst results are obtained with the NB, FNB, and LBR (Lazy Bayes rule) classifiers. We show that although the time to build the discretization scheme could be longer than the time to train the classifier, the completion of the entire process (to discretize data, compute the classifier, and predict test instances) is often faster than the NB-based classification of the continuous instances. This is because the time to classify test instances is an important factor that is positively influenced by discretization. The biggest positive influence, both on the accuracy and the classification time, is associated with the MODL, FCAIM, and CAIM algorithms.

机译：当前涉及大而不断增长的数据集的分类问题需要可扩展的分类算法。在这项研究中，我们专注于几种可扩展的线性复杂度分类器，其中包括投票率最高的10种数据挖掘方法之一，朴素贝叶斯（NB）以及最近提出的几种半NB分类器。这些算法执行连续特征的前端离散化，因为通过设计，它们仅适用于名义或离散特征。我们解决了在后续分类的背景下缺乏研究离散化的利弊的研究。我们的综合实证研究考虑了12个分散器（两个无监督和10个监督），七个分类器（两个经典NB和五个半NB）以及16个数据集。我们研究了离散化器的可扩展性，并显示出最快的受监督离散化器可快速实现类属性相互依赖最大化（FCAIM），类属性相互依赖最大化（CAIM）和信息熵最大化（1EM）提供具有最高总体质量的离散化方案。我们表明，与在原始数据上执行的两种经典方法NB和Flexible Naive Bayes（FNB）相比，离散化提高了分类的准确性。离散化算法的选择会影响改进的重要性。 MODL，FCAIM和CAIM方法提供了统计上显着的改进，而IEM，类别属性权变系数（CACC）和Khiops离散器提供了适度的改进。最准确的分类模型是由平均一依赖估计量（AODEsr）分类器生成的，随后是AODE和HNB（隐藏的朴素贝叶斯）。 AODEsr在使用MODL，FCAIM和CAIM离散化的数据上运行，与两种传统的NB方法相比，其统计学上的准确性要高得多。使用NB，FNB和LBR（惰性贝叶斯规则）分类器可获得最差的结果。我们表明，尽管建立离散化方案的时间可能比训练分类器的时间更长，但整个过程（离散化数据，计算分类器和预测测试实例）的完成过程通常比基于NB的速度更快。连续实例的分类。这是因为对测试实例进行分类的时间是受离散化影响的重要因素。对准确性和分类时间的最大正面影响与MODL，FCAIM和CAIM算法有关。

著录项

来源
《The Knowledge Engineering Review》 |2010年第4期|p.421-449|共29页
作者
MARCIN J. MIZIANTY; LUKASZ A. KURGAN; MAREK R. OGIELA;
展开▼
作者单位

Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada;

Department of Electrical and Computer Engineering, University of Alberta, Edmonton, Canada;

Bio-cybernetics Laboratory, Institute of Automatics, AGH University of Science and Technology, Krakow, Poland;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
入库时间 2022-08-18 00:38:55

相似文献

外文文献
中文文献
专利

1. Tree-augmented naieve Bayes-based model for intrusion detection system [J] . Mradul Dhakar, Akhilesh Tiwari International Journal of Knowledge Engineering and Data Mining . 2014,第1期

机译：基于树型朴素贝叶斯的入侵检测系统模型
2. Intelligent Naieve Bayes-based approaches for Web proxy caching [J] . Waleed Ali, Siti Mariyam Shamsuddin, Abdul Samad Ismail Knowledge-Based Systems . 2012,第期

机译：基于智能朴素贝叶斯的Web代理缓存方法
3. A Smoothed Naieve Bayes-based Classifier for Activity Recognition [J] . A. M. Jehad Sarkar, Young-Koo Lee, Sungyoung Lee IETE Technical Review . 2010,第2期

机译：用于活动识别的平滑朴素贝叶斯-贝叶斯分类器
4. OFFD: Optimal Flexible Frequency Discretization for Naieve Bayes Classification [C] . Song Wang, Fan Min, Zhihai Wang, Advanced data mining and applications . 2009

机译：OFF：朴素贝叶斯分类的最佳灵活频率离散化
5. Data mining techniques for prediction and classification in discrete data applications. [D] . Better, Marco L. 2007

机译：用于离散数据应用程序中的预测和分类的数据挖掘技术。
6. Detection of associations with rare and common SNPs for quantitative traits: a nonparametric Bayes-based approach [O] . Lili Ding, Tesfaye M Baye, Hua He, 2011

机译：检测与稀有和常见的SNP的数量性状的关联：基于非参数贝叶斯的方法
7. Bayes-based ARP attack detection algorithm for cloud centers [O] . Huan Ma, Hao Ding, Yang Yang, 2016

机译：基于云中心的贝叶斯ARP攻击检测算法
8. Application of Empirical Bayes Decision Procedures to Discrete Time Linear Filtering. Investigation of an Empirical Bayes Filter for Use in Trajectory Estimation Final Report [R] . Kamat, S. J., Martz, H. F. 1970

机译：经验Bayes决策程序在离散时间线性滤波中的应用。用于弹道估计最终报告的经验贝叶斯滤波器研究

Discretization as the enabling technique for the Naieve Bayes and semi-Naieve Bayes-based classification

摘要

著录项

相似文献

相关主题

期刊订阅