The Lorenz Dominance Order as a Measure of Interestingness in KDD

机译：劳伦兹统治秩序是衡量KDD兴趣程度的标准

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Ranking summaries generated from databases is useful within the context of descriptive data mining tasks where a single data set can be generalized in many different ways and to many levels of granularity. Our approach to generating summaries is based upon a data structure, associated with an attribute, called a domain generalization graph (DGG). A DGG for an attribute is a directed graph where each node represents a domain of values created by partitioning the original domain for the attribute, and each edge represents a generalization relation between these domains. Given a set of DGGs associated with a set of attributes, a generalization space can be defined as all possible combinations of domains, where one domain is selected from each DGG for each combination. This generalization space describes, then, all possible summaries consistent with the DGGs that can be generated from the selected attributes. When the number of attributes to be generalized is large or the DGGs associated with the attributes are complex, the generalization space can be very large, resulting in the generation of many summaries. The number of summaries can easily exceed the capabilities of a domain expert to identify interesting results. In this paper, we show that the Lorenz dominance order can be used to rank the summaries prior to presentation to the domain expert. The Lorenz dominance order defines a partial order on the summaries, in most cases, and in some cases, defines a total order. The rank order of the summaries represents an objective evaluation of their relative interestingness and provides the domain expert with a starting point for further subjective evaluation of the summaries.

机译：在描述性数据挖掘任务的上下文中，对从数据库生成的摘要进行排名很有用，在该任务中，单个数据集可以以许多不同的方式推广到许多粒度级别。我们生成摘要的方法基于与属性相关联的数据结构，该结构称为域概括图（DGG）。属性的DGG是有向图，其中每个节点表示通过对属性的原始域进行分区而创建的值的域，并且每个边表示这些域之间的泛化关系。给定与一组属性关联的一组DGG，可以将泛化空间定义为域的所有可能组合，其中从每个DGG中为每个组合选择一个域。然后，该概括空间描述了与可以从所选属性生成的DGG一致的所有可能的摘要。当要泛化的属性数量很大或与该属性关联的DGG复杂时，泛化空间可能会很大，从而导致生成许多摘要。摘要的数量很容易超过领域专家识别有趣结果的能力。在本文中，我们证明了在向领域专家展示之前，可以使用Lorenz优势顺序对摘要进行排名。在大多数情况下，洛伦兹支配顺序定义了部分汇总顺序，在某些情况下，还定义了总顺序。摘要的等级顺序代表对它们相对有趣程度的客观评估，并为领域专家提供了对摘要进行进一步主观评估的起点。

著录项

来源
《Advances in Knowledge Discovery and Data Mining》|2002年|p.177-185|共9页
会议地点
作者
Robert J. Hilderman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Use of the Lorenz curve to measure size inequality and growth dominance in forest populations [J] . West P. W. Australian Forestry . 2018,第4期

机译：利用Lorenz曲线测量森林群体中的大小不等式和增长优势
2. Properties of rule interestingness measures and alternative approaches to normalization of measures [J] . Greco S., S?owiński R., Szczech I. Information Sciences: An International Journal . 2012,第Null期

机译：规则兴趣度度量的性质和度量标准化的替代方法
3. KDDI TEPCO merger threatens NTT's FTTH dominance [J] . Mike Galbraith Telecom Asia . 2006,第3期

机译：KDDI TEPCO合并威胁NTT的FTTH优势
4. The Lorenz Dominance Order as a Measure of Interestingness in KDD [C] . Robert J. Hilderman Pacific-Asia Conference on Knowledge Discovery and Data Mining . 2002

机译：Lorenz统治秩序作为KDD中有趣的衡量标准
5. Measuring Interestingness in Outliers with Explanation Facility using Belief Networks. [D] . Masood, Adnan. 2014

机译：使用Belief网络使用解释工具测量离群值的兴趣度。
6. The Lorenz Curve: A Proper Framework to Define Satisfactory Measures of Symbol Dominance Symbol Diversity and Information Entropy [O] . Julio A. Camargo 2020

机译：Lorenz曲线：一个适当的框架用于定义符号优势符号分集和信息熵的令人满意的措施
7. The Lorenz dominance order as a measure of interestingness in KDD [O] . Robert J. Hilderman 2002

机译：洛伦兹统治秩序作为衡量KDD趣味性的指标

The Lorenz Dominance Order as a Measure of Interestingness in KDD

摘要

著录项

相似文献

相关主题

期刊订阅