首页> 外文会议>Federated Conference on Computer Science and Information Systems >Transformation of nominal features into numeric in supervised multi-class problems based on the weight of evidence parameter
【24h】

Transformation of nominal features into numeric in supervised multi-class problems based on the weight of evidence parameter

机译:基于证据权重的监督多类问题中的名义特征转换为数值

获取原文

摘要

Machine learning has received increased interest by both the scientific community and the industry. Most of the machine learning algorithms rely on certain distance metrics that can only be applied to numeric data. This becomes a problem in complex datasets that contain heterogeneous data consisted of numeric and nominal (i.e. categorical) features. Thus the need of transformation from nominal to numeric data. Weight of evidence (WoE) is one of the parameters that can be used for transformation of the nominal features to numeric. In this paper we describe a method that uses WoE to transform the features. Although the applicability of this method is researched to some extent, in this paper we extend its applicability for multi-class problems, which is a novelty. We compared it with the method that generates dummy features. We test both methods on binary and multi-class classification problems with different machine learning algorithms. Our experiments show that the WoE based transformation generates smaller number of features compared to the technique based on generation of dummy features while also improving the classification accuracy, reducing memory complexity and shortening the execution time. Be that as it may, we also point out some of its weaknesses and make some recommendations when to use the method based on dummy features generation instead.
机译:机器学习已引起科学界和业界的越来越多的兴趣。大多数机器学习算法都依赖于某些距离度量,这些距离度量只能应用于数字数据。这在包含由数值和名义(即分类)特征组成的异构数据的复杂数据集中成为一个问题。因此,需要从标称数据转换为数字数据。证据权重(WoE)是可用于将名义特征转换为数字的参数之一。在本文中,我们描述了一种使用WoE变换特征的方法。尽管对该方法的适用性进行了一定程度的研究,但在本文中,我们将其扩展到多类问题的适用性,这是一个新颖的问题。我们将其与生成虚拟特征的方法进行了比较。我们使用不同的机器学习算法对二进制和多分类问题进行了测试。我们的实验表明,与基于虚拟特征的生成技术相比,基于WoE的变换生成的特征数量更少,同时还提高了分类精度,降低了内存复杂性并缩短了执行时间。尽管如此,我们还指出了它的一些弱点,并在使用基于伪特征生成的方法时提出了一些建议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号