首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Toward Optimal Feature Selection in Naive Bayes for Text Categorization
【24h】

Toward Optimal Feature Selection in Naive Bayes for Text Categorization

机译:朴素贝叶斯文本分类的最佳特征选择

获取原文
获取原文并翻译 | 示例

摘要

Automated feature selection is important for text categorization to reduce feature size and to speed up learning process of classifiers. In this paper, we present a novel and efficient feature selection framework based on the Information Theory, which aims to rank the features with their discriminative capacity for classification. We first revisit two information measures: Kullback-Leibler divergence and Jeffreys divergence for binary hypothesis testing, and analyze their asymptotic properties relating to type I and type II errors of a Bayesian classifier. We then introduce a new divergence measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure multi-distribution divergence for multi-class classification. Based on the JMH-divergence, we develop two efficient feature selection methods, termed maximum discrimination ( ) and methods, for text categorization. The promising results of extensive experiments demonstrate the effectiveness of the proposed approaches.
机译:自动化的特征选择对于文本分类非常重要,以减小特征尺寸并加快分类器的学习过程。在本文中,我们提出了一种基于信息论的新颖而有效的特征选择框架,旨在以特征的分类能力对特征进行排序。我们首先回顾两种信息量度:用于二元假设检验的Kullback-Leibler散度和Jeffreys散度,并分析它们与贝叶斯分类器的I型和II型误差有关的渐近性质。然后,我们引入一种称为Jeffreys-Multi-Hypothesis(JMH)散度的新散度度量,以测量用于多类分类的多分布散度。基于JMH散度,我们开发了两种有效的特征选择方法,称为最大判别()和方法,用于文本分类。大量实验的有希望的结果证明了所提出方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号