【24h】

Discovering Skylines of Subgroup Sets

机译:发现子集集的天际线

获取原文

摘要

Many tasks in exploratory data mining aim to discover the top-k results with respect to a certain interestingness measure. Unfortunately, in practice top-k solution sets are hardly satisfactory, if only because redundancy in such results is a severe problem. To address this, a recent trend is to find diverse sets of high-quality patterns. However, a 'perfect' diverse top-k cannot possibly exist, since there is an inherent trade-off between quality and diversity. We argue that the best way to deal with the quality-diversity tradeoff is to explicitly consider the Pareto front, or skyline, of non-dominated solutions, i.e. those solutions for which neither quality nor diversity can be improved without degrading the other quantity. In particular, we focus on k-pattern set mining in the context of Subgroup Discovery . For this setting, we present two algorithms for the discovery of skylines; an exact algorithm and a levelwise heuristic. We evaluate the performance of the two proposed skyline algorithms, and the accuracy of the levelwise method. Furthermore, we show that the skylines can be used for the objective evaluation of subgroup set heuristics. Finally, we show characteristics of the obtained skylines, which reveal that different quality-diversity trade-offs result in clearly different subgroup sets. Hence, the discovery of skylines is an important step towards a better understanding of 'diverse top-k's'.
机译:探索性数据挖掘中的许多任务旨在发现与某种趣味性度量有关的前k个结果。不幸的是,实际上,仅因为这些结果中的冗余是一个严重的问题,top-k解决方案集就很难令人满意。为了解决这个问题,最近的趋势是找到各种不同的高质量模式。但是,由于在质量和多样性之间存在着内在的权衡关系,因此不可能存在“完美”的多样化top-k。我们认为,解决质量/多样性权衡问题的最佳方法是明确考虑非主导解决方案的帕累托前沿或天际线,即那些不能改善质量或多样性而又不降低其他数量的解决方案。特别地,我们专注于在子组发现的背景下进行k模式集挖掘。对于此设置,我们提出了两种用于发现天际线的算法:精确的算法和逐级启发式算法。我们评估了两种提出的天际线算法的性能以及逐级方法的准确性。此外,我们表明,天际线可用于子集启发式算法的客观评估。最后,我们显示了获得的天际线的特征,这表明不同的质量-多样性权衡导致明显不同的子集集。因此,发现天际线是迈向更好地理解“多样的前k个”的重要一步。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号