首页> 外文会议>International Conference on ICT and Knowledge Engineering >Comparing Feature Selection Methods by Using Rank Aggregation
【24h】

Comparing Feature Selection Methods by Using Rank Aggregation

机译:使用等级汇总比较特征选择方法

获取原文

摘要

Feature selection (FS) is becoming critical in this data era. Selecting effective features from datasets is a particularly important part in text classification, data mining, pattern recognition and artificial intelligence. FS excludes irrelevant features from the classification task, reduces the dimensionality of a dataset, allows us to better understand data, improves the performance of machine learning techniques, and minimizes the computation requirement. Thus far, a large number of FS methods have been proposed, however the most effective one in practice remains unclear. Though it is conceivable that different categories of FS methods have different evaluation criteria for variables, there are few studies fixating on evaluating various categories of FS methods. This article gathers ten superior FS methods under four different categories, and fixates on evaluating and comparing them in general versatility (constant ability to select out the useful features) regarding authorship attribution problems. Besides, this article tries to identify which method is most effective. SVM (support vector machine) serves as the classifier. Different categories of features, different numbers of top variables in feature rankings, and different performance measures are employed to measure the effectiveness and general versatility of these methods together. Finally, rank aggregation method Schulze (SSD) is employed to make a ranking of the ten FS methods. The analysis results suggest that Mahalanobis distance is the best method on the whole.
机译:在这个数据时代,功能选择(FS)变得至关重要。从数据集中选择有效特征是文本分类,数据挖掘,模式识别和人工智能中特别重要的部分。 FS从分类任务中排除了无关的功能,降低了数据集的维数,使我们能够更好地理解数据,提高了机器学习技术的性能,并最大程度地减少了计算需求。迄今为止,已经提出了大量的FS方法,但是在实践中最有效的方法仍然不清楚。尽管可以想象不同类别的FS方法对变量的评估标准不同,但是很少有研究致力于评估各种类别的FS方法。本文收集了四个不同类别下的十种高级FS方法,并着眼于评估和比较它们在作者归属问题上的通用性(恒定地选择有用的功能)。此外,本文试图确定哪种方法最有效。 SVM(支持向量机)用作分类器。使用不同类别的特征,不同数量的特征变量中的顶级变量以及不同的性能指标来一起衡量这些方法的有效性和通用性。最后,采用排名聚合方法舒尔茨(SSD)对十个FS方法进行排名。分析结果表明,从整体上来说,马氏距离是最好的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号