首页> 外文会议>Machine learning and data mining in pattern recognition >A General Framework of Feature Selection for Text Categorization
【24h】

A General Framework of Feature Selection for Text Categorization

机译:文本分类特征选择的通用框架

获取原文
获取原文并翻译 | 示例

摘要

Many feature selection methods have been proposed for text categorization. However, their performances are usually verified by experiments, so the results rely on the corpora used and may not be accurate. This paper proposes a novel feature selection framework called Distribution-Based Feature Selection (DBFS) based on distribution difference of features. This framework generalizes most of the state-of-the-art feature selection methods including OCFS, MI, ECE, IG, CHI and OR. The performances of many feature selection methods can be estimated by theoretical analysis using components of this framework. Besides, DBFS sheds light on the merits and drawbacks of many existing feature selection methods. In addition, this framework helps to select suitable feature selection methods for specific domains. Moreover, a weighted model based on DBFS is given so that suitable feature selection methods for unbalanced datasets can be derived. The experimental results show that they are more effective than CHI, IG and OCFS on both balanced and unbalanced datasets.
机译:已经提出了许多特征选择方法用于文本分类。但是,它们的性能通常通过实验验证,因此结果取决于所使用的语料库,可能并不准确。基于特征的分布差异,本文提出了一种新颖的特征选择框架,称为基于分布的特征选择(DBFS)。该框架概括了大多数最新的功能选择方法,包括OCFS,MI,ECE,IG,CHI和OR。可以通过使用该框架的组件进行理论分析来估计许多特征选择方法的性能。此外,DBFS揭示了许多现有特征选择方法的优缺点。另外,该框架有助于为特定领域选择合适的特征选择方法。此外,给出了基于DBFS的加权模型,以便可以导出不平衡数据集的合适特征选择方法。实验结果表明,它们在平衡和不平衡数据集上均比CHI,IG和OCFS更有效。

著录项

  • 来源
  • 会议地点 Leipzig(DE);Leipzig(DE)
  • 作者单位

    Institute of Computing Technology,Chinese Academy of Sciences, Beijing, 100190, China Graduate University, Chinese Academy of Sciences, Beijing, 100080, China;

    Institute of Computing Technology,Chinese Academy of Sciences, Beijing, 100190, China;

    rnSchool of Software Microelectronics, Peking University, Beijing, 102600, China;

    rnCenter of Network Information and Education Technology, Beijing Language and Culture University, Beijing, 100083, China;

  • 会议组织
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 计算机的应用;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号