首页> 外文会议>Data Mining, 2009. ICDM '09 >A Tree-Based Framework for Difference Summarization
【24h】

A Tree-Based Framework for Difference Summarization

机译:一个基于树的差异汇总框架

获取原文

摘要

Understanding the differences between two datasets is a fundamental data mining question and is also ubiquitously important across many real world scientific applications. In this paper, we propose a tree-based framework to provide a parsimonious explanation of the difference between two distributions based on rigorous two-sample statistical test. We develop two efficient approaches. The first one is a dynamic programming approach that finds a minimal number of data subsets that describe the difference between two data sets. The second one is a greedy approach that approximates the dynamic programming approach. We employ the well-known Friedman's MST (minimal spanning tree) statistics for two-sample statistical tests in our summarization tree construction, and develop novel techniques to speedup its computational procedure. We performed a detailed experimental evaluation on both real and synthetic datasets and demonstrated the effectiveness of our tree-summarization approach.
机译:理解两个数据集之间的差异是一个基本的数据挖掘问题,并且在许多现实世界的科学应用中也具有无处不在的重要性。在本文中,我们提出了一个基于树的框架,以基于严格的两样本统计检验来简要解释两个分布之间的差异。我们开发了两种有效的方法。第一种是动态编程方法,可找到描述两个数据集之间差异的最小数据子集。第二种是贪婪的方法,近似于动态编程方法。我们在摘要树的构造中采用了著名的弗里德曼的MST(最小生成树)统计数据进行两样本统计检验,并开发了新颖的技术来加快其计算过程。我们对真实和合成数据集进行了详细的实验评估,并证明了我们的树汇总方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号