Merging of Numerical Intervals in Entropy-Based Discretization

Jerzy W. Grzymala-Busse; Teresa Mroczek

首页> 外文期刊>Entropy >Merging of Numerical Intervals in Entropy-Based Discretization

【24h】

Merging of Numerical Intervals in Entropy-Based Discretization

机译：基于熵的离散化中数值区间的合并

获取原文

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

As previous research indicates, a multiple-scanning methodology for discretization of numerical datasets, based on entropy, is very competitive. Discretization is a process of converting numerical values of the data records into discrete values associated with numerical intervals defined over the domains of the data records. In multiple-scanning discretization, the last step is the merging of neighboring intervals in discretized datasets as a kind of postprocessing. Our objective is to check how the error rate, measured by tenfold cross validation within the C4.5 system, is affected by such merging. We conducted experiments on 17 numerical datasets, using the same setup of multiple scanning, with three different options for merging: no merging at all, merging based on the smallest entropy, and merging based on the biggest entropy. As a result of the Friedman rank sum test (5% significance level) we concluded that the differences between all three approaches are statistically insignificant. There is no universally best approach. Then, we repeated all experiments 30 times, recording averages and standard deviations. The test of the difference between averages shows that, for a comparison of no merging with merging based on the smallest entropy, there are statistically highly significant differences (with a 1% significance level). In some cases, the smaller error rate is associated with no merging, in some cases the smaller error rate is associated with merging based on the smallest entropy. A comparison of no merging with merging based on the biggest entropy showed similar results. So, our final conclusion was that there are highly significant differences between no merging and merging, depending on the dataset. The best approach should be chosen by trying all three approaches.

机译：正如先前的研究表明，基于熵的用于数值数据集离散化的多重扫描方法非常有竞争力。离散化是将数据记录的数值转换为与在数据记录的域上定义的数字间隔关联的离散值的过程。在多次扫描离散化中，最后一步是将离散化数据集中的相邻区间合并为一种后处理。我们的目标是检查通过C4.5系统进行十倍交叉验证所测得的错误率如何受到这种合并的影响。我们使用相同的多重扫描设置对17个数值数据集进行了实验，并提供三种不同的合并选项：完全不合并，基于最小熵的合并和基于最大熵的合并。弗里德曼秩和检验（显着性水平为5％）的结果是，我们得出结论，所有这三种方法之间的差异在统计上均不显着。没有普遍最佳的方法。然后，我们重复了所有实验30次，记录了平均值和标准偏差。对平均值之间的差异进行的检验表明，对于不合并与基于最小熵的合并的比较，存在统计学上的显着差异（显着性水平为1％）。在某些情况下，较小的错误率与没有合并相关联，在某些情况下，较小的错误率与基于最小熵的合并相关联。不合并与基于最大熵的合并的比较显示了相似的结果。因此，我们的最终结论是，根据数据集的不同，在没有合并和合并之间存在非常显着的差异。应该通过尝试所有三种方法来选择最佳方法。

著录项

来源
《Entropy》 |2018年第11期|共12页
作者
Jerzy W. Grzymala-Busse; Teresa Mroczek;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类生理学;
关键词
data miningdiscretizationnumerical attributesentropy;

机译：数据挖掘离散化数字属性熵;

相似文献

外文文献
中文文献
专利

1. An Entropy-Based Weighted Concept Lattice for Merging Multi-Source Geo-Ontologies [J] . Junli Li, Qiaoli Zhu, Zongyi He Entropy . 2013,第6期

机译：基于熵的加权概念格，用于融合多源地理本体
2. A New Entropy-Based Approach to Determine the Weights of Decision Makers for Each Criterion With Crisp and Interval Data in Group Decision Making Under Multiple Attribute [J] . Mohammad Azadfallah International Journal of Service Science, Management, Engineering, and Technology . 2018,第4期

机译：新的基于熵的多属性群体决策中具有酥脆和区间数据的决策者权重确定方法
3. Online entropy-based discretization for data streaming classification [J] . S. Ramírez-Gallego, S. García, F. Herrera Future generation computer systems . 2018,第SEPa期

机译：基于在线熵的离散化数据流分类
4. Hybrid fuzzy genetics-based machine learning with entropy-based inhomogeneous interval discretization [C] . Takahashi Yuji, Nojima Yusuke, Ishibuchi Hisao IEEE International Conference on Fuzzy Systems . 2014

机译：基于混合模糊遗传学的机器学习与基于熵的非均匀区间离散
5. Contributions to Interval Estimation for Parameters of Discrete Distributions [D] . Holladay, Bret Andrew. 2019

机译：对离散分布参数的贡献算法
6. Reduced Data Sets and Entropy-Based Discretization [O] . Jerzy W. Grzymala-Busse, Zdzislaw S. Hippe, Teresa Mroczek 2019

机译：减少数据集和基于熵的离散化
7. An Entropy-Based Weighted Concept Lattice for Merging Multi-Source Geo-Ontologies [O] . Zongyi He, Qiaoli Zhu, Junli Li 2013

机译：基于熵的加权概念格合并多源地理本体

Merging of Numerical Intervals in Entropy-Based Discretization

摘要

著录项

相似文献

相关主题

期刊订阅