首页> 中文期刊> 《计算机工程与应用》 >结合类内集中度和最小集合覆盖的特征选择

结合类内集中度和最小集合覆盖的特征选择

             

摘要

特征选择是文本分类中的核心研究课题之一.简单分析了词频和文档频,在此基础上提出了类内集中度,把集合覆盖的思想引入粗糙集并提出了一个基于最小集合覆盖的属性约简算法,把该属性约简算法同类内集中度结合起来,提出了一个新的特征选择方法.该方法利用类内集中度进行特征初选以过滤掉一些词条来降低特征空间的稀疏性,利用所提约简算法消除冗余,从而获得较具代表性的特征子集.实验结果表明此种特征选择方法效果良好.%Feature selection is one of the core research topics in text categorization.Word frequency and document frequency are analyzed simply.Category concentration based on word frequency and document frequency is presented.Set covering is introduced into rough sets and an attribute reduction algorithm based on minimal set covering is provided.A new feature selection method combined the provided attribute reduction algorithm with the category concentration is proposed.The new method uses the category concentration to select feature and filter out some terms to reduce the sparsity of feature spaces,and then employs the proposed attribute reduction algorithm to eliminate redundancy, so that the more representative feature subset is acquired.The experimental results show that the new method is promising.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号