首页> 外文会议>ACIS/IEEE international conference on software engineering, artificial intelligence, networking and parallel/distributed computing >A term weighting scheme based on the measure of relevance and distinction for text categorization
【24h】

A term weighting scheme based on the measure of relevance and distinction for text categorization

机译:基于文本分类的相关性和区分度量的术语加权方案

获取原文

摘要

Feature selection is often considered as a key step in text categorization. In this paper, we proposed a new feature selection algorithm, named AD, which comprehensively measures the degree of relevance and distinction of terms occur in document set. We evaluated AD on three benchmark document collections, 20-Newsgroups, Reuters-21578 and WebKB, using two classification algorithms, Naive Bayes and Support Vector Machines. The experimental results, comparing AD with six classic feature-selection algorithms, show that the proposed method AD is significantly superior to Information Gain, Mutual Information, Odds Ratio, DIA association factor, Orthogonal Centroid Feature Selection and Ambiguity Measure when Naive Bayes classifier is used and significantly outperforms IG, MI, OR, DIA, OCFS and AM when Support Vector Machines is used.
机译:功能选择通常被视为文本分类的关键步骤。在本文中,我们提出了一个名为AD的新特征选择算法,它全面测量文档集中出现的相关性和区别。我们使用两个分类算法,天真贝叶斯和支持向量机,在三个基准文件集合,20次新闻组,路透社-21578和WebkB上进行了评估。实验结果,使用六种经典特征选择算法比较广告,表明所提出的方法广告显着优于信息增益,互信息,赔率比,直接关联因子,正交质量因子,当使用天真贝叶斯分类器时的歧义测量在使用支持向量机时,显着优于IG,MI或,Dia,Dia,OCFS和AM。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号