首页> 外文会议>European conference on machine learning and principles and practice of knowledge discovery in databases >Identifying Consistent Statements About Numerical Data with Dispersion-Corrected Subgroup Discovery
【24h】

Identifying Consistent Statements About Numerical Data with Dispersion-Corrected Subgroup Discovery

机译:识别具有色散校正子组发现的数值数据的一致陈述

获取原文

摘要

Existing algorithms for subgroup discovery with numerical targets do not optimize the error or target variable dispersion of the groups they find. This often leads to unreliable or inconsistent statements about the data, rendering practical applications, especially in scientific domains, futile. Therefore, we here extend the optimistic estimator framework for optimal subgroup discovery to a new class of objective functions: we show how tight estimators can be computed efficiently for all functions that are determined by subgroup size (non-decreasing dependence), the subgroup median value, and a dispersion measure around the median (non-increasing dependence). In the important special case when dispersion is measured using the mean absolute deviation from the median, this novel approach yields a linear time algorithm. Empirical evaluation on a wide range of datasets shows that, when used within branch-and-bound search, this approach is highly efficient and indeed discovers subgroups with much smaller errors.
机译:现有的具有数字目标的子组发现算法无法优化发现的组的误差或目标变量分散。这通常导致对数据的陈述不可靠或不一致,从而使实际应用(尤其是在科学领域中)徒劳无功。因此,在这里,我们将用于最佳子组发现的乐观估计器框架扩展到一类新的目标函数:我们展示了如何有效地计算由子组大小(非递减依赖性),子组中位数确定的所有函数的紧估计器,以及围绕中位数的离散度度量(非依赖性增加)。在重要的特殊情况下,当使用与中位数的平均绝对偏差来测量色散时,这种新颖的方法会产生线性时间算法。对大量数据集的经验评估表明,当在分支定界搜索中使用时,这种方法非常高效,并且确实发现了误差较小的子组。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号