首页> 外文学位 >Hyper-rectangle-based discriminative data generalization and applications in data mining.
【24h】

Hyper-rectangle-based discriminative data generalization and applications in data mining.

机译:基于超矩形的歧视性数据泛化及其在数据挖掘中的应用。

获取原文
获取原文并翻译 | 示例

摘要

The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as human-comprehensible patterns from which end-users can gain intuitions and insights. Axis-parallel hyper-rectangles provide interpretable generalizations for multi-dimensional data points with numerical attributes. In this dissertation, we study the fundamental problem of rectangle-based discriminative data generalization in the context of several useful data mining applications: cluster description, rule learning, and Nearest Rectangle classification.;Clustering is one of the most important data mining tasks. However, most clustering methods output sets of points as clusters and do not generalize them into interpretable patterns. We perform a systematic study of cluster description, where we propose novel description formats leading to enhanced expressive power and introduce novel description problems specifying different trade-offs between interpretability and accuracy. We also present efficient heuristic algorithms for the introduced problems in the proposed formats.;If-then rules are known to be the most expressive and human-comprehensible representation of knowledge. Rectangles are essentially a special type of rules with all the attributional conditions specified whereas normal rules appear more compact. Decision rules can be used for both data classification and data description depending on whether the focus is on future data or existing data. For either scenario, smaller rule sets are desirable. We propose a novel rectangle-based and graph-based rule learning approach that finds rule sets with small cardinality.;We also consider Nearest Rectangle learning to explore the data classification capacity of generalized rectangles. We show that by enforcing the so-called "right of inference", Nearest Rectangle learning can potentially become an interpretable hybrid inductive learning method with competitive accuracy.;Keywords. discriminative generalization; hyper-rectangle; cluster description; Minimum Rule Set; Minimum Consistent Subset Cover; Nearest Rectangle learning.
机译:数据挖掘的最终目标是从海量数据中提取知识。理想情况下,知识是人类可以理解的模式,最终用户可以从中获得直觉和见解。轴平行超矩形为具有数值属性的多维数据点提供了可解释的概括。本文在聚类描述,规则学习和最近矩形分类等几种有用的数据挖掘应用中研究了基于矩形的判别数据泛化的基本问题。聚类是最重要的数据挖掘任务之一。但是,大多数聚类方法将点集输出为聚类,并且不将其概括为可解释的模式。我们对聚类描述进行了系统的研究,我们提出了新颖的描述格式以增强表达能力,并介绍了新颖的描述问题,这些问题规定了可解释性和准确性之间的不同权衡。我们还针对提出的格式中存在的问题提出了有效的启发式算法。If-then规则是已知的最能表达和人类理解的知识表示形式。矩形实质上是一种特殊类型的规则,其中指定了所有归因条件,而普通规则显得更为紧凑。决策规则可用于数据分类和数据描述,具体取决于重点是将来的数据还是现有的数据。对于任何一种情况,都希望使用较小的规则集。我们提出了一种新颖的基于矩形和基于图的规则学习方法,该方法可以找到基数较小的规则集。我们还考虑了最近矩形学习,以探索广义矩形的数据分类能力。我们证明,通过实施所谓的“推理权”,最近矩形学习可以潜在地成为具有竞争准确性的可解释的混合归纳学习方法。歧视性概括;超矩形集群描述;最小规则集;最小一致子集覆盖率;最近的矩形学习。

著录项

  • 作者

    Gao, Byron Ju.;

  • 作者单位

    Simon Fraser University (Canada).;

  • 授予单位 Simon Fraser University (Canada).;
  • 学科 Computer science.
  • 学位 Ph.D.
  • 年度 2006
  • 页码 138 p.
  • 总页数 138
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 能源与动力工程;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号