Evaluating the performance of cost-based discretization versus entropy- and error-based discretization

Davy Janssens; Tom Brijs; Koen Vanhoof; Geert Wets

首页> 外文期刊>Computers & operations research >Evaluating the performance of cost-based discretization versus entropy- and error-based discretization

【24h】

Evaluating the performance of cost-based discretization versus entropy- and error-based discretization

机译：评估基于成本的离散化与基于熵和错误的离散化的性能

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Discretization is defined as the process that divides continuous numeric values into intervals of discrete categorical values. In this article, the concept of cost-based discretization as a pre-processing step to the induction of a classifier is introduced in order to obtain an optimal multi-interval splitting for each numeric attribute. A transparent description of the method and the steps involved in cost-based discretization are given. The aim of this paper is to present this method and to assess the potential benefits of such an approach. Furthermore, its performance against two other well-known methods, i.e. entropy- and pure error-based discretization is examined. To this end, experiments on 14 data sets, taken from the UCI Repository on Machine Learning were carried out. In order to compare the different methods, the area under the Receiver Operating Characteristic (ROC) graph was used and tested on its level of significance. For most data sets the results show that cost-based discretization achieves satisfactory results when compared to entropy- and error-based discretization. Given its importance, many researchers have already contributed to the issue of discretization in the past. However, to the best of our knowledge, no efforts have been made yet to include the concept of misclassification costs to find an optimal multi-split for discretization purposes, prior to induction of the decision tree. For this reason, this new concept is introduced and explored in this article by means of operations research techniques.

机译：离散化定义为将连续数值分成离散类别值的间隔的过程。在本文中，引入了基于成本的离散化概念作为分类器归纳的预处理步骤，以便为每个数字属性获得最佳的多间隔拆分。给出了基于成本的离散化方法和步骤的透明描述。本文的目的是介绍这种方法并评估这种方法的潜在好处。此外，检查了它相对于其他两种众所周知的方法的性能，即基于熵和纯误差的离散化。为此，从UCI机器学习知识库中提取了14个数据集的实验。为了比较不同的方法，使用了接收器工作特征（ROC）图下方的区域并对其重要性进行了测试。对于大多数数据集，结果表明，与基于熵和基于误差的离散化相比，基于成本的离散化取得了令人满意的结果。鉴于其重要性，过去许多研究人员已经为离散化问题做出了贡献。但是，据我们所知，尚未做出任何努力来包括误分类成本的概念，以便在归纳决策树之前为离散化目的找到最佳的多重分割。因此，本文将通过运筹学技术介绍和探索这一新概念。

著录项

来源
《Computers & operations research》 |2006年第11期|p.3107-3123|共17页
作者
Davy Janssens; Tom Brijs; Koen Vanhoof; Geert Wets;
展开▼
作者单位

Limburgs Universitair Centrum, Research Group Data Analysis and Modelling, Universitaire Campus, Gebouw D, B-3590 Diepenbeek, Belgium;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
discretization; ROC-curve; cost-sensitive learning;

机译：离散化;ROC曲线;成本敏感型学习;

相似文献

外文文献
中文文献
专利

1. The Effects of Evaluating Video Examples of Staffs' Own Versus Others' Performance on Discrete-Trial Training Skills in a Human Service Setting [J] . W. LARRY WILLIAMS, JULIANNE GALLINAT Journal of organizational behavior management . 2011,第2期

机译：在员工服务环境中评估员工自己的视频示例与他人的绩效对离散试用培训技能的影响
2. Performance Evaluation of Medical Image Compression Using Discrete Cosine and Discrete Wavelet Transform [J] . Anupreksha Jain, Asstt.Prof. ShilpaDatar International Journal of Engineering Trends and Technology . 2014,第6期

机译：基于离散余弦和离散小波变换的医学图像压缩性能评估
3. A discrete expression of Canny's criteria for step edge detector performances evaluation [J] . Demigny D., Kamle T. IEEE Transactions on Pattern Analysis and Machine Intelligence . 1997,第11期

机译：阶跃边缘检测器性能评估的Canny标准的离散表达式
4. Unintentional Islanding Evaluation Utilizing Discrete RLC Circuit Versus Power Hardware-in-the Loop Method [C] . Sigifredo Gonzalez, Edgardo Desarden-Carrero, Nicholas S. Gurule, IEEE Photovoltaic Specialists Conference . 2019

机译：利用离散RLC电路与电源硬件在环法的无意孤岛评估
5. Computational features and performance-evaluation of discrete/continuous type discrete-time control systems [D] . Hunt, Ashley N. 2011

机译：离散/连续型离散时间控制系统的计算特性和性能评估
6. A Discrete Event Simulation Model for Evaluating the Performances of an M/G/C/C State Dependent Queuing System [O] . Ruzelan Khalid, Mohd Kamal M. Nawawi, Luthful A. Kawsar, -1

机译：评估M / G / C / C状态相关排队系统性能的离散事件仿真模型
7. A Novel Discrete Wavelet Domain Error-Based Image Quality Metric with Enhanced Perceptual Performance [O] . Soroosh Rezazadeh, Stéphane Coulombe 2012

机译：一种新的离散小波域误差基于误差的图像质量度量，增强了感知性能
8. Performance Evaluation of a Forward Arming and Refueling Point (FARP) Using Discrete Event Simulation [R] . Lewis, J. R. 2005

机译：基于离散事件仿真的前向避雷点（FaRp）性能评估

Evaluating the performance of cost-based discretization versus entropy- and error-based discretization

摘要

著录项

相似文献

相关主题

期刊订阅