...
首页> 外文期刊>Journal of computational and theoretical nanoscience >A Term Frequency Based Weighting Scheme Using Na?ve Bayes for Text Classification
【24h】

A Term Frequency Based Weighting Scheme Using Na?ve Bayes for Text Classification

机译:一种使用Na ve Bayes进行文本分类的术语加权方案

获取原文
获取原文并翻译 | 示例
           

摘要

Term weighting is a strategy to assign weights to terms to improve the performance of many classifiers, such as kNN and SVM in text classification. Supervised term weighting methods have received increasing attention, in which information on the membership of training documents to classes is used. Most existing methods follow the local weight multiplies the global weight framework, but the contribution of term frequency for term weighting has not been fully investigated. In this paper, we propose a weighting scheme named term frequency-relevance term frequency based on a probabilistic model. After investigating two kinds of widely used na?ve Bayes (NB) models, we employ the term event Multinomial NB model to capture the term frequency information. The matching score function based on the prediction probability ratio can then be factorized. Finally, we get the weight for each term by replacing the parameter by an estimator, term frequency is used in formulating not only the local weight factor but also the global weight factor. Numerical experiment results on two benchmark text datasets (Reuters-21578 and 20 Newsgroups) demonstrate that our proposed method outperforms the representative term weighting methods.
机译:术语加权是将权重分配给术语的策略,以提高许多分类器的性能,例如文本分类中的KNN和SVM。监督术语加权方法已收到越来越多的关注,其中使用有关培训文件成员资格到课程的信息。大多数现有方法遵循本地权重乘以全局权重框架,但术语频率术语频率的贡献尚未完全研究。在本文中,我们提出了一种基于概率模型的命名术语频率相关项频率的加权方案。在调查两种广泛使用的NA?VE贝叶斯(NB)模型后,我们采用术语事件多项式NB模型来捕获术语频率信息。然后基于预测概率比的匹配得分函数可以进行分解。最后,我们通过估计器替换参数来获得每个术语的重量,术语频率不仅在制定局部权重因子而且使用全局权重因子。在两个基准文本数据集(REUTERS-21578和20新闻组)上的数值实验结果表明,我们所提出的方法优于代表性术语加权方法。

著录项

  • 来源
  • 作者单位

    Key Laboratory of Intelligent Information Processing of Jilin Universities School of Computer Science and Information Technology Northeast Normal University Changchun 130117 China;

    Key Laboratory of Intelligent Information Processing of Jilin Universities School of Computer Science and Information Technology Northeast Normal University Changchun 130117 China;

    Key Laboratory of Intelligent Information Processing of Jilin Universities School of Computer Science and Information Technology Northeast Normal University Changchun 130117 China;

    Key Laboratory of Intelligent Information Processing of Jilin Universities School of Computer Science and Information Technology Northeast Normal University Changchun 130117 China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 薄膜技术;
  • 关键词

    Na?ve Bayes; Supervised Term Weighting; Text Classification; Term Event; Term Frequency;

    机译:天真的贝父;监督术语加权;文本分类;术语事件;术语频率;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号