首页> 外文会议>IEEE Symposium Series on Computational Intelligence >Structured Iterative Hard Thresholding for Categorical and Mixed Data Types
【24h】

Structured Iterative Hard Thresholding for Categorical and Mixed Data Types

机译:分类和混合数据类型的结构化迭代硬阈值

获取原文

摘要

In many applications, data exists in a mixed data type format, i.e. a combination of nominal (categorical) and numericalal features. A common practice for working with categorical features is to use an encoding method to transform the discrete values into numeric representation. However, numeric representation often neglects the innate structures in categorical features, potentially degrading the performance of learning algorithms. Utilizing the numeric representation could also limit interpretation of the learned model, such as finding the most discriminative categorical features or filtering irrelevant attributes. In this work, we extend the iterative hard thresholding (IHT) algorithm to quantify the structure of categorical features. The empirical evaluation of the proposed structured hard thresholding algorithm is based on both real and synthetic data sets in comparison with the original hard thresholding algorithm, LASSO and Random Forest. The results demonstrate an improved performance over the original IHT.
机译:在许多应用中,数据以混合数据类型格式存在,即名义(分类)特征和数字特征的组合。使用分类特征的常见做法是使用编码方法将离散值转换为数字表示形式。但是,数值表示通常会忽略分类特征中的固有结构,从而可能降低学习算法的性能。利用数字表示还可能会限制对学习模型的解释,例如找到最有区别的分类特征或过滤不相关的属性。在这项工作中,我们扩展了迭代硬阈值(IHT)算法,以量化分类特征的结构。与原始的硬阈值算法,LASSO和Random Forest相比,所提出的结构化硬阈值算法的实证评估是基于真实数据集和综合数据集。结果表明,与原始IHT相比,性能有所提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号