Transformation of nominal features into numeric in supervised multi-class problems based on the weight of evidence parameter

机译：基于证据权重的监督多类问题中的名义特征转换为数值

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Machine learning has received increased interest by both the scientific community and the industry. Most of the machine learning algorithms rely on certain distance metrics that can only be applied to numeric data. This becomes a problem in complex datasets that contain heterogeneous data consisted of numeric and nominal (i.e. categorical) features. Thus the need of transformation from nominal to numeric data. Weight of evidence (WoE) is one of the parameters that can be used for transformation of the nominal features to numeric. In this paper we describe a method that uses WoE to transform the features. Although the applicability of this method is researched to some extent, in this paper we extend its applicability for multi-class problems, which is a novelty. We compared it with the method that generates dummy features. We test both methods on binary and multi-class classification problems with different machine learning algorithms. Our experiments show that the WoE based transformation generates smaller number of features compared to the technique based on generation of dummy features while also improving the classification accuracy, reducing memory complexity and shortening the execution time. Be that as it may, we also point out some of its weaknesses and make some recommendations when to use the method based on dummy features generation instead.

机译：机器学习已引起科学界和业界的越来越多的兴趣。大多数机器学习算法都依赖于某些距离度量，这些距离度量只能应用于数字数据。这在包含由数值和名义（即分类）特征组成的异构数据的复杂数据集中成为一个问题。因此，需要从标称数据转换为数字数据。证据权重（WoE）是可用于将名义特征转换为数字的参数之一。在本文中，我们描述了一种使用WoE变换特征的方法。尽管对该方法的适用性进行了一定程度的研究，但在本文中，我们将其扩展到多类问题的适用性，这是一个新颖的问题。我们将其与生成虚拟特征的方法进行了比较。我们使用不同的机器学习算法对二进制和多分类问题进行了测试。我们的实验表明，与基于虚拟特征的生成技术相比，基于WoE的变换生成的特征数量更少，同时还提高了分类精度，降低了内存复杂性并缩短了执行时间。尽管如此，我们还指出了它的一些弱点，并在使用基于伪特征生成的方法时提出了一些建议。

著录项

来源
《Federated Conference on Computer Science and Information Systems》|2015年|169-179|共11页
会议地点
作者
Zdravevski Eftim; Lameski Petre; Kulakov Andrea; Kalajdziski Slobodan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
data analysis; learning (artificial intelligence); WoE based transformation; categorical features; classification accuracy; complex datasets; dummy features; evidence parameter; execution time; heterogeneous data; machine learning algorithms; nominal features; numeric features; scientific community; supervised multiclass problems; weight of evidence; Accuracy; Machine learning algorithms; Mathematical model; Measurement; Tin; Training; Transforms; Weight of Evidence; WoE; categorical features; data transformation; dummy features; heterogeneous data; nominal features;

机译：数据分析;学习（人工智能）;基于WoE的转换;分类特征;分类精度;复杂数据集;虚拟特征;证据参数;执行时间;异构数据;机器学习算法;名义特征;数字特征;科学界;监督的多类问题证据权重准确度机器学习算法数学模型测量锡训练变换证据权重WoE分类特征数据转换虚拟特征异构数据名义特征;

相似文献

外文文献
中文文献
专利

1. A novel logistic multi-class supervised classification model based on multi-fractal spectrum parameters for hyperspectral data [J] . Na Li, Hui-jie Zhao, Ping Huang, International journal of computer mathematics . 2015,第3a4期

机译：基于多分形谱参数的高光谱数据逻辑多类监督分类模型
2. SELECTION OF NUMERICAL AND NOMINAL FEATURES BASED ON PROBABILISTIC DEPENDENCE BETWEEN FEATURES [J] . Krzysztof Michalak, Halina Kwasnicka, Ewa Watorek, Applied Artificial Intelligence . 2011,第8a10期

机译：基于特征之间的概率相关性的数值和名义特征选择
3. Shallow landslide initiation susceptibility mapping by GIS-based weights-of-evidence analysis of multi-class spatial data-sets: a case study from the natural sloping terrain of Western Ghats, India [J] . H. Vijith, K.N. Krishnakumar, G.S. Pradeep, Georisk . 2014,第1期

机译：基于GIS的多类空间数据集的证据权重分析法进行的浅层滑坡起爆敏感性测绘：以印度西高止山脉的自然坡地为例
4. Transformation of nominal features into numeric in supervised multi-class problems based on the weight of evidence parameter [C] . Zdravevski Eftim, Lameski Petre, Kulakov Andrea, Federated Conference on Computer Science and Information Systems . 2015

机译：基于证据参数的重量，标称特征转换为监督多级问题的数字
5. A feature-based algorithm for spike sorting involving intelligent feature-weighting mechanism. [D] . Patwardhan, Kaustubh Anil. 2011

机译：一种基于特征的尖峰排序算法，涉及智能特征加权机制。
6. Comparing minimally supervised home-based and closely supervised gym-based exercise programs in weight reduction and insulin resistance after bariatric surgery: A randomized clinical trial [O] . Sara Kaviani, Haleh Dadgostar, Ali Mazaherinezhad, 2017

机译：对比减肥手术后在体重减轻和胰岛素抵抗方面受最低监管的家庭健身计划和受严格监管的体育锻炼计划：一项随机临床试验
7. Transformation of nominal features into numeric in supervised multi-class problems based on the weight of evidence parameter [O] . Eftim Zdravevski, Petre Lameski, Andrea Kulakov, 2015

机译：基于证据参数的重量，标称特征转换为监督多级问题的数字

Transformation of nominal features into numeric in supervised multi-class problems based on the weight of evidence parameter

摘要

著录项

相似文献

相关主题

期刊订阅