首页> 外文会议>Federated Conference on Computer Science and Information Systems >Transformation of nominal features into numeric in supervised multi-class problems based on the weight of evidence parameter
【24h】

Transformation of nominal features into numeric in supervised multi-class problems based on the weight of evidence parameter

机译:基于证据参数的重量,标称特征转换为监督多级问题的数字

获取原文

摘要

Machine learning has received increased interest by both the scientific community and the industry. Most of the machine learning algorithms rely on certain distance metrics that can only be applied to numeric data. This becomes a problem in complex datasets that contain heterogeneous data consisted of numeric and nominal (i.e. categorical) features. Thus the need of transformation from nominal to numeric data. Weight of evidence (WoE) is one of the parameters that can be used for transformation of the nominal features to numeric. In this paper we describe a method that uses WoE to transform the features. Although the applicability of this method is researched to some extent, in this paper we extend its applicability for multi-class problems, which is a novelty. We compared it with the method that generates dummy features. We test both methods on binary and multi-class classification problems with different machine learning algorithms. Our experiments show that the WoE based transformation generates smaller number of features compared to the technique based on generation of dummy features while also improving the classification accuracy, reducing memory complexity and shortening the execution time. Be that as it may, we also point out some of its weaknesses and make some recommendations when to use the method based on dummy features generation instead.
机译:机器学习得到了科学界和行业的增加的兴趣。大多数机器学习算法依赖于某些距离指标,只能应用于数字数据。这成为包含异构数据的复杂数据集中的问题,该数据包括数字和标称(即分类)功能。因此,需要从标称到数字数据的转换。证据重量(WOE)是可用于将标称特征转换为数字的参数之一。在本文中,我们描述了一种使用WOE来改造功能的方法。虽然这种方法的适用性在一定程度上进行了研究,但在这篇文章中,我们将其适用于多级问题,这是一种新颖性。我们将其与生成虚拟功能的方法进行了比较。我们用不同机器学习算法测试二元和多级分类问题的两种方法。我们的实验表明,与基于伪特征的技术相比,基于WOE基的转换产生较少数量的特征,同时还提高了分类精度,降低了内存复杂度并缩短了执行时间。尽管如此,我们还指出了一些弱点,并在使用基于虚拟功能的方法使用该方法时提出一些建议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号