首页> 外文期刊>Trends in Ecology & Evolution >Comparing multiple categories of feature selection methods for text classification
【24h】

Comparing multiple categories of feature selection methods for text classification

机译:比较文本分类的多个类别的特征选择方法

获取原文
获取原文并翻译 | 示例
           

摘要

Selecting effective features from data sets is a particularly important part in text classification, data mining, pattern recognition, and artificial intelligence. Feature selection (FS) is capable of excluding irrelevant features for the classification task and reducing the dimensionality of data sets, which help us better understand data. Through FS selection, the performance of machine learning techniques is improved, and computation requirement is minimized. Thus far, a large number of FS methods have been proposed, whereas the most practically effective one has not been found. Although it is conceivable that different categories of FS methods follow different criteria for evaluating variables, rare studies have focused on evaluating various categories of FS methods. This article first lists thirteen superior FS methods under five different categories and focuses on evaluating and comparing the effectiveness and general versatility of these methods. The thirteen FS methods were ranked using rank aggregation method. Subsequently, the best five FS methods were elected to perform multi-class classifications. Support vector machine served as the classifier. Different languages, different numbers of selected features, and different performance measures were employed to measure the effectiveness and general versatility of these methods together. The analysis results suggest that Mahalanobis distance is the best method on the whole.
机译:从数据集中选择有效特征是文本分类,数据挖掘,模式识别和人工智能中的特别重要的部分。特征选择(FS)能够排除分类任务的无关功能,并减少数据集的维度,帮助我们更好地了解数据。通过FS选择,改善了机器学习技术的性能,并且计算要求最小化。到目前为止,已经提出了大量的FS方法,而尚未发现最实际有效的方法。尽管可以想到,不同类别的FS方法遵循评估变量的不同标准,但罕见的研究专注于评估各类FS方法。本文首先列出了五个不同类别的十三个优越的FS方法,并侧重于评估和比较这些方法的有效性和一般多功能性。使用秩聚集方法排序十三个FS方法。 Subsequently, the best five FS methods were elected to perform multi-class classifications.支持向量机用作分类器。采用不同的语言,不同数量的选择特征和不同的性能措施来测量这些方法的效力和通用性。分析结果表明,Mahalanobis距离是整体上最好的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号