首页> 美国卫生研究院文献>International Scholarly Research Notices >A Novel Feature Selection Technique for Text Classification Using Naïve Bayes
【2h】

A Novel Feature Selection Technique for Text Classification Using Naïve Bayes

机译:基于朴素贝叶斯的文本分类新特征选择技术

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.
机译:随着非结构化数据的激增,文本分类或文本分类在主题分类,情感分析,作者身份识别,垃圾邮件检测等方面发现了许多应用。有许多可用的分类算法。朴素贝叶斯仍然是最古老和最受欢迎的分类器之一。一方面,朴素贝叶斯的实现很简单,另一方面,这也需要较少的训练数据。从文献综述中发现,朴素贝叶斯在文本分类中的表现比其他分类器差。结果,尽管模型简单且直观,但这仍使朴素的贝叶斯分类器无法使用。在本文中,我们提出了一种基于两步特征选择的方法,首先基于单变量特征选择,然后进行特征聚类,其中我们使用单变量特征选择方法来减少搜索空间,然后应用聚类来选择相对独立的特征集。通过对13个数据集进行全面评估和比较,我们证明了该方法的有效性。这样获得的性能改进使朴素的贝叶斯可与其他分类器相比或更高。所显示的算法表现出优于其他传统方法,例如基于贪婪搜索的包装器或CFS。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号