首页> 外文学位 >Hyper-rectangle-based discriminative data generalization and applications in data mining.

【24h】

Hyper-rectangle-based discriminative data generalization and applications in data mining.

机译：基于超矩形的歧视性数据泛化及其在数据挖掘中的应用。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The ultimate goal of data mining is to extract knowledge from massive data. Knowledge is ideally represented as human-comprehensible patterns from which end-users can gain intuitions and insights. Axis-parallel hyper-rectangles provide interpretable generalizations for multi-dimensional data points with numerical attributes. In this dissertation, we study the fundamental problem of rectangle-based discriminative data generalization in the context of several useful data mining applications: cluster description, rule learning, and Nearest Rectangle classification.;Clustering is one of the most important data mining tasks. However, most clustering methods output sets of points as clusters and do not generalize them into interpretable patterns. We perform a systematic study of cluster description, where we propose novel description formats leading to enhanced expressive power and introduce novel description problems specifying different trade-offs between interpretability and accuracy. We also present efficient heuristic algorithms for the introduced problems in the proposed formats.;If-then rules are known to be the most expressive and human-comprehensible representation of knowledge. Rectangles are essentially a special type of rules with all the attributional conditions specified whereas normal rules appear more compact. Decision rules can be used for both data classification and data description depending on whether the focus is on future data or existing data. For either scenario, smaller rule sets are desirable. We propose a novel rectangle-based and graph-based rule learning approach that finds rule sets with small cardinality.;We also consider Nearest Rectangle learning to explore the data classification capacity of generalized rectangles. We show that by enforcing the so-called "right of inference", Nearest Rectangle learning can potentially become an interpretable hybrid inductive learning method with competitive accuracy.;Keywords. discriminative generalization; hyper-rectangle; cluster description; Minimum Rule Set; Minimum Consistent Subset Cover; Nearest Rectangle learning.

机译：数据挖掘的最终目标是从海量数据中提取知识。理想情况下，知识是人类可以理解的模式，最终用户可以从中获得直觉和见解。轴平行超矩形为具有数值属性的多维数据点提供了可解释的概括。本文在聚类描述，规则学习和最近矩形分类等几种有用的数据挖掘应用中研究了基于矩形的判别数据泛化的基本问题。聚类是最重要的数据挖掘任务之一。但是，大多数聚类方法将点集输出为聚类，并且不将其概括为可解释的模式。我们对聚类描述进行了系统的研究，我们提出了新颖的描述格式以增强表达能力，并介绍了新颖的描述问题，这些问题规定了可解释性和准确性之间的不同权衡。我们还针对提出的格式中存在的问题提出了有效的启发式算法。If-then规则是已知的最能表达和人类理解的知识表示形式。矩形实质上是一种特殊类型的规则，其中指定了所有归因条件，而普通规则显得更为紧凑。决策规则可用于数据分类和数据描述，具体取决于重点是将来的数据还是现有的数据。对于任何一种情况，都希望使用较小的规则集。我们提出了一种新颖的基于矩形和基于图的规则学习方法，该方法可以找到基数较小的规则集。我们还考虑了最近矩形学习，以探索广义矩形的数据分类能力。我们证明，通过实施所谓的“推理权”，最近矩形学习可以潜在地成为具有竞争准确性的可解释的混合归纳学习方法。歧视性概括；超矩形集群描述；最小规则集；最小一致子集覆盖率；最近的矩形学习。

著录项

作者
Gao, Byron Ju.;
展开▼
作者单位

Simon Fraser University (Canada).;

展开▼
授予单位 Simon Fraser University (Canada).;
学科 Computer science.
学位 Ph.D.
年度 2006
页码 138 p.
总页数 138
原文格式 PDF
正文语种 eng
中图分类能源与动力工程;
关键词

相似文献

外文文献
中文文献
专利

1. Financial profiling of public hospitals: an application by data mining. [J] . Ozgulbas N, Koyuncugil AS The International journal of health planning and management . 2009,第1期

机译：公立医院的财务概况：数据挖掘的应用。
2. Mining the chemical quarry with joint chemical probes: an application of latent semantic structure indexing (LaSSI) and TOPOSIM (Dice) to chemical database mining. [J] . Singh SB, Sheridan RP, Fluder EM, Journal of Medicinal Chemistry . 2001,第10期

机译：使用联合化学探针挖掘化学采石场：潜在语义结构索引（LaSSI）和TOPOSIM（Dice）在化学数据库挖掘中的应用。
3. International data-sharing for radiotherapy research: an open-source based infrastructure for multicentric clinical data mining. [J] . Erik Roelofs, André Dekker, Elisa Meldolesi, Radiotherapy and oncology: Journal of the European Society for Therapeutic Radiology and Oncology . 2014,第2期

机译：放射治疗研究的国际数据共享：基于开源的多中心临床数据挖掘基础架构。
4. From data collection to knowledge data discovery: a medical application of data mining. [C] . Duhamel A, Picavet M, Devos P, MEDINFO . 2001

机译：从数据收集到知识数据发现：数据挖掘的医学应用。
5. On a Generalization of the Gini Correlation for Statistical Data Mining. [D] . Gao, Yi. 2016

机译：关于统计数据挖掘的基尼相关性的推广。
6. A new generalization of Weibull distribution with application to a breast cancer data set [O] . Abdus S. Wahed, The Minh Luong, Jong-Hyeon Jeong -1

机译：用施用乳腺癌数据集的Weibull分布的新概括
7. Hyper-rectangle-based discriminative data generalization and applications in data mining [O] . Gao Byron Ju 2007

机译：基于超矩形的判别性数据泛化及其在数据挖掘中的应用
8. Analyzing Asset Management Data Using Data and Text Mining. [R] . Williams, T., Halling, M. 2014

机译：使用数据和文本挖掘分析资产管理数据。

Hyper-rectangle-based discriminative data generalization and applications in data mining.

摘要

著录项

相似文献

相关主题

期刊订阅