【24h】

A Simple Probability Based Term Weighting Scheme for Automated Text Classification

机译:基于简单概率的术语自动加权术语分类方案

获取原文
获取原文并翻译 | 示例

摘要

In the automated text classification, tfidf is often considered as the default term weighting scheme and has been widely reported in literature. However, tfidf does not directly reflect terms' category membership. Inspired by the analysis of various feature selection methods, we propose a simple probability based term weighting scheme which directly utilizes two critical information ratios,i.e. Relevance indicators. These relevance indicators are nicely supported by probability estimates which embody the category membership. Our experimental study based on two data sets, including Reuters-21578, demonstrates that the proposed probability based term weighting scheme outperforms tfidf significantly using Bayesian classifier and Support Vector Machines (SVM).
机译:在自动文本分类中,tfidf通常被视为默认术语加权方案,并且已在文献中广泛报道。但是,tfidf不能直接反映术语的类别成员资格。受各种特征选择方法分析的启发,我们提出了一种简单的基于概率的术语加权方案,该方案直接利用两个关键信息比率,即相关性指标。这些相关性指标得到体现类别成员资格的概率估计的很好支持。我们基于包括Reuters-21578在内的两个数据集进行的实验研究表明,使用贝叶斯分类器和支持向量机(SVM),基于概率的术语加权方案明显优于tfidf。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号