首页> 外文期刊>Procedia Computer Science >Complexity of Rule Sets Mined from Incomplete Data Using Probabilistic Approximations Based on Generalized Maximal Consistent Blocks
【24h】

Complexity of Rule Sets Mined from Incomplete Data Using Probabilistic Approximations Based on Generalized Maximal Consistent Blocks

机译:规则集的复杂性使用基于概括的最大一​​致块的概率近似地从不完整的数据开采

获取原文
           

摘要

In this paper, incomplete data sets have two kinds of missing attribute vales: lost values and “do not care” conditions. Lost values are interpreted as erased or as not inserted into the data set, while “do not care” conditions may be replaced by any specified attribute value. In addition, we use two kinds of probabilistic approximations, global and saturated. Both probabilistic approximations are constructed from generalized maximal consistent blocks. Since we are using two kinds of missing attribute values and two kinds of probabilistic approximations, we use four different ways of data mining. In our previous study, it was shown that pairwise differences in an error rate, evaluated by ten-fold cross validation between these four ways of data mining are statistically insignificant (5% level of significance). Hence, we explore the next important problem: when the rule sets will be the simplest. We show that the total number of rules is the smallest when missing attribute values are interpreted as “do not care” conditions. The difference between using both kinds of probabilistic approximations is insignificant.
机译:在本文中,不完整的数据集有两种缺少的属性vales:损失值和“不关心”条件。丢失的值被解释为擦除或未插入数据集,而“不关心”条件可能会被任何指定的属性值替换。此外,我们使用两种概率近似,全局和饱和。这两种概率近似都由广义最大一致块构成。由于我们使用两种缺少的属性值和两种概率近似,因此我们使用四种不同的数据挖掘方式。在我们以前的研究中,表明在这四种数据挖掘方面之间的十倍交叉验证评估了误差率的成对差异是统计上微不足道的(意义程度为5%)。因此,我们探讨了下一个重要问题:当规则集将是最简单的时。我们表明,当缺少属性值被解释为“不关心”条件时,规则总数是最小的。使用两种概率近似之间的差异是微不足道的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号