首页> 外文会议>2012 International workshop on image processing and optical engineering >Feature Selection Combined Category Concentration Degree with Minimal Set Covering
【24h】

Feature Selection Combined Category Concentration Degree with Minimal Set Covering

机译:特征选择结合类别集中度和最小集合覆盖

获取原文
获取原文并翻译 | 示例

摘要

Feature selection is the core research topic in text categorization. Selected feature subset directly influences results of text categorization. Firstly, word frequency and document frequency were analyzed. And then, the category concentration degree based on word frequency and document frequency was proposed. Next, set covering was introduced into rough sets and an attribute reduction algorithm based on minimal set covering was provided. Finally, a new feature selection method combined the proposed category concentration degree with the provided attribute reduction algorithm was presented. The presented feature selection method firstly uses the proposed category concentration degree to select features and filter out some terms to reduce the sparsity of feature spaces, and then employs the provided attribute reduction algorithm to eliminate redundancy, so that the more representative feature subset was acquired. The experimental results show that presented feature selection method is better than the three classical feature selection methods: information gain (IG), x~2 statistics (CHI), mutual information (MI) in time performance, macro-average F_1 and micro-average F_1.
机译:特征选择是文本分类中的核心研究主题。所选特征子集直接影响文本分类的结果。首先,分析了词频和文档频度。然后,提出了基于词频和文档频度的类别集中度。接下来,将集合覆盖率引入粗糙集,并提供基于最小集合覆盖率的属性约简算法。最后,提出了一种新的特征选择方法,该方法将提出的类别集中度与所提供的属性约简算法相结合。提出的特征选择方法首先利用拟议的类别集中度选择特征并过滤掉某些项以减少特征空间的稀疏性,然后采用提供的属性约简算法消除冗余,从而获得更具代表性的特征子集。实验结果表明,提出的特征选择方法优于三种经典的特征选择方法:信息增益(IG),x〜2统计量(CHI),时间性能互信息(MI),宏平均F_1和微观平均F_1。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号