首页> 外文会议>Multimedia Information Networking and Security, 2009. MINES '09 >Two-Stage Feature Selection Method for Text Classification
【24h】

Two-Stage Feature Selection Method for Text Classification

机译:文本分类的两阶段特征选择方法

获取原文

摘要

Dimension reduction is the process of reducing the number of random features under consideration, and can be divided into the feature selection and the feature extraction. A two-stage feature selection method based on the Regularized Least Squares-Multi Angle Regression and Shrinkage (RLS-MARS) model is proposed in this paper: In the first stage, a new weighting method, the Term Frequency Inverse Document and Category Frequency Collection normalization (TF-IDCFC) is applied to measure the features, and select the important features by using the category information as a factor. In the second stage, the RLS-MARS model is used to select the relevant information, while the Regularized Least Squares (RLS) with the Least Angle Regression and Shrinkage (LARS) can be viewed as an efficient approach. The experiments on Fudan University Chinese Text Classification Corpus and 20 Newsgroups, both of those datasets demonstrate the effectiveness of the new feature selection method for text classification in several classical algorithms: KNN and SVMLight.
机译:尺寸减小是减少所考虑的随机特征数量的过程,并且可以分为特征选择和特征提取。本文提出了一种基于正规化最小二乘 - 多角度回归和收缩(RLS-MARS)模型的两阶段特征选择方法:在第一阶段,新的加权方法,术语频率逆文档和类别频率收集应用归一化(TF-IDCFC)来测量特征,并通过使用类别信息作为一个因素选择重要特征。在第二阶段,RLS-MARS模型用于选择相关信息,而具有最小角度回归和收缩(LARS)的正则化最小二乘(RLS)可以被视为有效的方法。复旦大学中文文本分类语料库和20个新闻组的实验,这两个数据集都证明了几种经典算法中的文本分类的新特征选择方法的有效性:KNN和SVMlight。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号