首页> 外文会议>Advances in Knowledge Discovery and Data Mining >The Lorenz Dominance Order as a Measure of Interestingness in KDD
【24h】

The Lorenz Dominance Order as a Measure of Interestingness in KDD

机译:劳伦兹统治秩序是衡量KDD兴趣程度的标准

获取原文

摘要

Ranking summaries generated from databases is useful within the context of descriptive data mining tasks where a single data set can be generalized in many different ways and to many levels of granularity. Our approach to generating summaries is based upon a data structure, associated with an attribute, called a domain generalization graph (DGG). A DGG for an attribute is a directed graph where each node represents a domain of values created by partitioning the original domain for the attribute, and each edge represents a generalization relation between these domains. Given a set of DGGs associated with a set of attributes, a generalization space can be defined as all possible combinations of domains, where one domain is selected from each DGG for each combination. This generalization space describes, then, all possible summaries consistent with the DGGs that can be generated from the selected attributes. When the number of attributes to be generalized is large or the DGGs associated with the attributes are complex, the generalization space can be very large, resulting in the generation of many summaries. The number of summaries can easily exceed the capabilities of a domain expert to identify interesting results. In this paper, we show that the Lorenz dominance order can be used to rank the summaries prior to presentation to the domain expert. The Lorenz dominance order defines a partial order on the summaries, in most cases, and in some cases, defines a total order. The rank order of the summaries represents an objective evaluation of their relative interestingness and provides the domain expert with a starting point for further subjective evaluation of the summaries.
机译:在描述性数据挖掘任务的上下文中,对从数据库生成的摘要进行排名很有用,在该任务中,单个数据集可以以许多不同的方式推广到许多粒度级别。我们生成摘要的方法基于与属性相关联的数据结构,该结构称为域概括图(DGG)。属性的DGG是有向图,其中每个节点表示通过对属性的原始域进行分区而创建的值的域,并且每个边表示这些域之间的泛化关系。给定与一组属性关联的一组DGG,可以将泛化空间定义为域的所有可能组合,其中从每个DGG中为每个组合选择一个域。然后,该概括空间描述了与可以从所选属性生成的DGG一致的所有可能的摘要。当要泛化的属性数量很大或与该属性关联的DGG复杂时,泛化空间可能会很大,从而导致生成许多摘要。摘要的数量很容易超过领域专家识别有趣结果的能力。在本文中,我们证明了在向领域专家展示之前,可以使用Lorenz优势顺序对摘要进行排名。在大多数情况下,洛伦兹支配顺序定义了部分汇总顺序,在某些情况下,还定义了总顺序。摘要的等级顺序代表对它们相对有趣程度的客观评估,并为领域专家提供了对摘要进行进一步主观评估的起点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号