首页> 外文期刊>INFORMS journal on computing >A Tree-Based Contrast Set-Mining Approach to Detecting Group Differences
【24h】

A Tree-Based Contrast Set-Mining Approach to Detecting Group Differences

机译:一种基于树的对比度集挖掘方法来检测组差异

获取原文
获取原文并翻译 | 示例
           

摘要

Understanding differences between groups in a data set is one of the fundamental tasks in data analysis. As relevant applications accumulate, data-mining methods have been developed to specifically address the problem of group difference detection. Contrast set mining discovers group differences in the form of conjunction of feature-value pairs or items. In this paper, we incorporate absolute difference, relative difference, and statistical significance in our definition of a group difference, and develop a novel method named DIFF that uses the prefix-tree structure to compress the search space, follows a tree traversal procedure to discover the complete set of significant group differences, and employs efficient pruning strategies to expedite the search process. We conducted comprehensive experiments to compare our method with existing methods on completeness of results, pruning efficiency, and computational efficiency. The experiments demonstrate that our method guarantees completeness of results and achieves higher pruning efficiency and computational efficiency compared to STUCCO. In addition, our definition of group difference is more general than STUCCO. Our method is more effective than traditional approaches, such as classification trees, in discovering the complete set of significant group differences.
机译:了解数据集中各组之间的差异是数据分析的基本任务之一。随着相关应用程序的积累,已经开发了数据挖掘方法来专门解决组差异检测问题。对比集挖掘以特征值对或项的结合形式发现组差异。在本文中,我们将绝对差,相对差和统计显着性纳入了组差的定义中,并开发了一种名为DIFF的新方法,该方法使用前缀树结构压缩搜索空间,并遵循树遍历过程来发现完整的重要分组差异集,并采用有效的修剪策略来加快搜索过程。我们进行了全面的实验,以比较我们的方法与现有方法在结果的完整性,修剪效率和计算效率方面的优势。实验表明,与STUCCO相比,我们的方法保证了结果的完整性,并实现了更高的修剪效率和计算效率。另外,我们对群体差异的定义比STUCCO更笼统。我们的方法比传统方法(例如分类树)在发现完整的重要组差异方面更有效。

著录项

  • 来源
    《INFORMS journal on computing》 |2014年第2期|208-221|共14页
  • 作者单位

    Research Center for Contemporary Management, Key Research Institute of Humanities and Social Sciences at Universities, School of Economics and Management, Tsinghua University, Beijing, China, 100084;

    Graduate School of Management, University of California, Davis, California 95616;

    School of Economics and Management, Tsinghua University, Beijing, China, 100084;

    School of Economics and Management, Tsinghua University, Beijing, China, 100084;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    data mining; group difference detection; contrast set mining;

    机译:数据挖掘;群体差异检测;对比集挖掘;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号