...
首页> 外文期刊>Journal of Intelligent Information Systems >A naive Bayes probability estimation model based on self-adaptive differential evolution
【24h】

A naive Bayes probability estimation model based on self-adaptive differential evolution

机译:基于自适应微分进化的朴素贝叶斯概率估计模型

获取原文
获取原文并翻译 | 示例

摘要

In the process of learning the naive Bayes, estimating probabilities from a given set of training samples is crucial. However, when the training samples are not adequate, probability estimation method will inevitably suffer from the zero-frequency problem. To avoid this problem, Laplace-estimate and M-estimate are the two main methods used to estimate probabilities. The estimation of two important parameters m (integer variable) and p (probability variable) in these methods has a direct impact on the underlying experimental results. In this paper, we study the existing probability estimation methods and carry out a parameter Cross-test by experimentally analyzing the performance of M-estimate with different settings for the two parameters m and p. This part of experimental result shows that the optimal parameter values vary corresponding to different data sets. Motivated by these analysis results, we propose an estimation model based on self-adaptive differential evolution. Then we propose an approach to calculate the optimal m and p value for each conditional probability to avoid the zero-frequency problem. We experimentally test our approach in terms of classification accuracy using the 36 benchmark machine learning repository data sets, and compare it to a naive Bayes with Laplace-estimate and M-estimate with a variety of setting of parameters from literature and those possible optimal settings via our experimental analysis. The experimental results show that the estimation model is efficient and our proposed approach significantly outperforms the traditional probability estimation approaches especially for large data sets (large number of instances and attributes).
机译:在学习朴素贝叶斯的过程中,从给定的训练样本集中估计概率至关重要。但是,当训练样本不足时,概率估计方法将不可避免地遭受零频问题。为避免此问题,拉普拉斯估计和M估计是用于估计概率的两种主要方法。这些方法中两个重要参数m(整数变量)和p(概率变量)的估计对基础实验结果有直接影响。在本文中,我们研究了现有的概率估计方法,并通过实验分析了两个参数m和p在不同设置下的M估计的性能,进行了参数交叉测试。实验结果的这一部分表明,最佳参数值对应于不同的数据集而变化。基于这些分析结果,我们提出了一种基于自适应差分进化的估计模型。然后,我们提出一种为每种条件概率计算最佳m和p值的方法,以避免出现零频问题。我们使用36个基准机器学习存储库数据集对分类精度进行实验性测试,并将其与采用Laplace估计和M估计的朴素贝叶斯方法进行比较,并采用文献中的各种参数设置以及通过我们的实验分析。实验结果表明,该估计模型是有效的,并且我们提出的方法明显优于传统的概率估计方法,特别是对于大型数据集(大量实例和属性)而言。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号