首页> 外文期刊>Expert systems with applications >Automatic recommendation of feature selection algorithms based on dataset characteristics
【24h】

Automatic recommendation of feature selection algorithms based on dataset characteristics

机译:基于数据集特征的特征选择算法的自动推荐

获取原文
获取原文并翻译 | 示例

摘要

Feature selection in real-world data mining problems is essential to make the learning task efficient and more accurate. Identifying the best feature selection algorithm, among the many available, is a complex activity that still relies heavily on human experts or some random trial-and-error procedure. Thus, the automated machine learning community has taken some steps towards the automation of this process. In this paper, we address the metalearning challenge of recommending feature selection algorithms by proposing a novel meta-feature engineering model. Our model considers a broad collection of meta-features that enable the study of the relationship between the dataset properties and the feature selection algorithm performance in terms of several criteria. We arrange the input meta-features into eight categories: (i) simple, (ii) statistical, (iii) information-theoretical, (iv) complexity, (v) landmarking, (vi) based on symbolic models, (vii) based on images, and (viii) based on complex networks (graphs). The target meta-features emerge from a multi-criteria performance measure, based on five individual performance indexes, that assesses feature selection methods grounded in information, distance, dependence, consistency, and precision measures. We evaluate our proposal using a recently developed framework that extracts the input meta-features from 213 benchmark datasets, and ranks the assessed feature selection algorithms, to fill in the target meta-features in meta-bases. This evaluation uses five state-of-the-art classification methods to induce recommendation models from meta-bases: C4.5, Random Forest, XGBoost, ANN, and SVM. The results showed that it is possible to reach an average accuracy of up to 90% applying our meta-feature engineering model. This work is the first to use an extensive empirical evaluation to provide a careful discussion of the strengths and limitations of more than 160 meta -features. These meta-features, while designed to aid the task of feature selection algorithm recommendation, can readily be employed in other metalearning scenarios. Therefore, we believe our findings are a valuable contribution to the fields of automated machine learning and data mining, as well as to the feature extraction and pattern recognition communities.
机译:现实世界中的功能选择是必不可少的,使学习任务有效,更准确。识别最佳特征选择算法在许多可用中,是一个复杂的活动,仍然依赖于人类专家或某些随机试验和错误过程。因此,自动化机器学习界已经采取了一些迈借了这一过程的自动化。在本文中,我们通过提出新颖的元特征工程模型来解决推荐特征选择算法的冶金学习挑战。我们的模型考虑了广泛的元特征集,使数据集属性与特征选择算法性能之间的关系研究了几个标准。我们将输入元特征安排成八类:(i)简单,(ii)统计,(iii)信息 - 理论,(iv)复杂性,(v)基于符号模型(vii)的地标,(vi)基于复杂网络(图)的图像和(viii)。目标元特征从多标准性能测量结果中出现,基于五个单独的性能指标,该尺寸评估了在信息,距离,依赖性,一致性和精密度量的特征选择方法。我们使用最近开发的框架评估我们的提议,该框架从213个基准数据集中提取输入元特征,并对评估的特征选择算法进行排名,以填充元基地中的目标元特征。该评估使用五种最先进的分类方法来引导来自Meta-Bases的推荐模型:C4.5,随机森林,XGBoost,ANN和SVM。结果表明,应用我们的元特征工程模型的平均精度高达90%。这项工作是首先使用广泛的实证评估,以仔细讨论超过160元的优势和局限性。这些元特征,虽然旨在帮助特征选择算法推荐的任务,可以容易地用于其他冶金学习场景。因此,我们相信我们的调查结果对自动化机器学习和数据挖掘领域的有价值贡献,以及特征提取和模式识别社区。

著录项

  • 来源
    《Expert systems with applications》 |2021年第12期|115589.1-115589.30|共30页
  • 作者单位

    Univ Sao Paulo Inst Math & Comp Sci Lab Computat Intelligence Av Trabalhador Sao Carlense 400 BR-13566590 Sao Carlos SP Brazil;

    Western Parana State Univ Engn & Exact Sci Ctr Lab Bioinformat Av Tarquinio Joslin dos Santos 1300 BR-85867900 Foz Do Iguacu PR Brazil;

    Western Parana State Univ Engn & Exact Sci Ctr Lab Bioinformat Av Tarquinio Joslin dos Santos 1300 BR-85867900 Foz Do Iguacu PR Brazil;

    Western Parana State Univ Engn & Exact Sci Ctr Lab Bioinformat Av Tarquinio Joslin dos Santos 1300 BR-85867900 Foz Do Iguacu PR Brazil|Univ Estadual Campinas Fac Med Sci Coloproctol Serv Rua Tessalia Vieira de Camargo 126 BR-13083887 Campinas SP Brazil;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Feature engineering; Characterization measures; Algorithm selection; Recommendation system; Filter; Wrapper;

    机译:特征工程;特征措施;算法选择;推荐系统;过滤器;包装;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号