首页> 外文会议>IEEE International Conference on Data Science in Cyberspace >Sparse Weighted Naive Bayes Classifier for Efficient Classification of Categorical Data
【24h】

Sparse Weighted Naive Bayes Classifier for Efficient Classification of Categorical Data

机译:高效分类数据的稀疏加权朴素贝叶斯分类器

获取原文

摘要

Feature selection has become a key challenge in machine learning with the rapid growth of data size in real-world applications. However, existing feature selection methods mainly focus on numeric data, which will lead to quality loss when handling classification problems involving categorical variables. In this paper, we proposed an improvement of Bayesian Classifier with the sparse regression technology. To the best of our knowledge, this is the first attempt to extend sparse regression for directly process of categorical variables. We implemented the idea for the case of weighted naive Bayes classifier. The introduction of L1 regularized learning ensures the algorithm to retain only a minimal subset of variables for model building, while at the same time achieves a near-optimal decision hyper-plane, which leads to excellent performance in case of high dimensional or small sample size situations. We carried out benchmark test on five UCI benchmark categorical data sets, which proved that the proposed algorithm have competitive performances over the original weighted naive bayes classifier and several state-of-the-art feature selection methods including L1 logistic regression and SVM-RFE.
机译:随着现实应用程序中数据大小的快速增长,特征选择已成为机器学习中的关键挑战。然而,现有的特征选择方法主要集中在数值数据上,这将在处理涉及分类变量的分类问题时导致质量损失。在本文中,我们提出了一种基于稀疏回归技术的贝叶斯分类器的改进方法。据我们所知,这是首次将稀疏回归扩展到直接处理分类变量的尝试。对于加权朴素贝叶斯分类器,我们实现了这种想法。 L1正则化学习的引入确保了算法仅保留变量的最小子集用于模型构建,同时实现了接近最优的决策超平面,这在高维或小样本量的情况下具有出色的性能情况。我们对5个UCI基准分类数据集进行了基准测试,证明了该算法与原始加权朴素贝叶斯分类器相比,具有竞争优势,并具有包括L1 logistic回归和SVM-RFE在内的几种最新特征选择方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号