Complexity of Rule Sets Mined from Incomplete Data Using Probabilistic Approximations Based on Generalized Maximal Consistent Blocks

Patrick G. Clark; Jerzy W. Grzymala-Busse; Zdzislaw S. Hippe; Teresa Mroczek; Rafal Niemiec

首页> 外文期刊>Procedia Computer Science >Complexity of Rule Sets Mined from Incomplete Data Using Probabilistic Approximations Based on Generalized Maximal Consistent Blocks

【24h】

Complexity of Rule Sets Mined from Incomplete Data Using Probabilistic Approximations Based on Generalized Maximal Consistent Blocks

机译：规则集的复杂性使用基于概括的最大一致块的概率近似地从不完整的数据开采

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, incomplete data sets have two kinds of missing attribute vales: lost values and “do not care” conditions. Lost values are interpreted as erased or as not inserted into the data set, while “do not care” conditions may be replaced by any specified attribute value. In addition, we use two kinds of probabilistic approximations, global and saturated. Both probabilistic approximations are constructed from generalized maximal consistent blocks. Since we are using two kinds of missing attribute values and two kinds of probabilistic approximations, we use four different ways of data mining. In our previous study, it was shown that pairwise differences in an error rate, evaluated by ten-fold cross validation between these four ways of data mining are statistically insignificant (5% level of significance). Hence, we explore the next important problem: when the rule sets will be the simplest. We show that the total number of rules is the smallest when missing attribute values are interpreted as “do not care” conditions. The difference between using both kinds of probabilistic approximations is insignificant.

机译：在本文中，不完整的数据集有两种缺少的属性vales：损失值和“不关心”条件。丢失的值被解释为擦除或未插入数据集，而“不关心”条件可能会被任何指定的属性值替换。此外，我们使用两种概率近似，全局和饱和。这两种概率近似都由广义最大一致块构成。由于我们使用两种缺少的属性值和两种概率近似，因此我们使用四种不同的数据挖掘方式。在我们以前的研究中，表明在这四种数据挖掘方面之间的十倍交叉验证评估了误差率的成对差异是统计上微不足道的（意义程度为5％）。因此，我们探讨了下一个重要问题：当规则集将是最简单的时。我们表明，当缺少属性值被解释为“不关心”条件时，规则总数是最小的。使用两种概率近似之间的差异是微不足道的。

著录项

来源
《Procedia Computer Science》 |2020年第5期|共10页
作者
Patrick G. Clark; Jerzy W. Grzymala-Busse; Zdzislaw S. Hippe; Teresa Mroczek; Rafal Niemiec;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类
关键词
Incomplete data miningrough set theoryprobabilistic approximationscharacteristic setsmaximal consistent blocks;

机译：不完整的数据miningrough设置理论提出的近似特征套装稳定块;

相似文献

外文文献
中文文献
专利

1. Characteristic sets and generalized maximal consistent blocks in mining incomplete data [J] . Clark Patrick G., Gao Cheng, Grzymala-Busse Jerzy W., Information Sciences: An International Journal . 2018,第期

机译：挖掘不完全数据的特征集和广义最大一致块
2. An Analysis of Probabilistic Approximations for Rule Induction from Incomplete Data Sets [J] . Patrick G. Clark, Jerzy W. Grzymala-Busse, Zdzislaw S. Hippe Fundamenta Informaticae . 2014,第3期

机译：来自不完整数据集的规则归纳的概率近似分析
3. Generalized probabilistic approximations of incomplete data [J] . Jerzy W. Grzymala-Busse, Patrick G. Clark, Martin Kuehnhausen 高分子論文集 . 2014,第1pta2期

机译：不完整数据的广义概率近似
4. Complexity of Rule Sets in Mining Incomplete Data Using Characteristic Sets and Generalized Maximal Consistent Blocks [C] . Patrick G. Clark, Cheng Gao, Jerzy W. Grzymala-Busse, International conference on hybrid artificial intelligent systems . 2018

机译：使用特征集和广义最大一致块挖掘不完整数据的规则集的复杂性
5. A Comparison of the Quality of Rule Induction from Inconsistent Data Sets and Incomplete Data Sets. [D] . Su, Xiaomeng. 2015

机译：来自不一致数据集和不完整数据集的规则归纳质量的比较。
6. Incremental learning of probabilistic rules from clinical databases based on rough set theory. [O] . S. Tsumoto, H. Tanaka 1997

机译：基于粗糙集理论从临床数据库中增量学习概率规则。
7. Critical Benchmarking of the G4(MP2) Model, the Correlation Consistent Composite Approach and Popular Density Functional Approximations on a Probabilistically Pruned Benchmark Dataset of Formation Enthalpies [O] . sambit kumar das, Sabyasachi Chakraborty, Raghunathan Ramakrishnan 2020

机译：G4（MP2）模型的关键基准测试，相关一致的复合方法和流行密度函数逼近的形成焓的概率普通基准数据集

Complexity of Rule Sets Mined from Incomplete Data Using Probabilistic Approximations Based on Generalized Maximal Consistent Blocks

摘要

著录项

相似文献

相关主题

期刊订阅