首页> 外文会议>SIAM International Conference on Data Mining >A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees
【24h】

A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees

机译:估算数据集和决策树的相似性的一般框架:探索决策树的语义相似性

获取原文

摘要

Decision trees are among the most popular pattern types in data mining due to their intuitive representation. However, little attention has been given on the definition of measures of semantic similarity between decision trees. In this work, we present a general framework for similarity estimation that includes as special cases the estimation of semantic similarity between decision trees, as well as various forms of similarity estimation on classification datasets with respect to different probability distributions defined over the attribute-class space of the datasets. The similarity estimation is based on the partitions induced by the decision trees on the attribute space of the datasets. We use the proposed framework in order to estimate the semantic similarity of decision trees induced from different subsamples of classification datasets; we evaluate its performance with respect to the empirical semantic similarity, which we estimate on the basis of independent hold-out test sets. The availability of similarity measures on decision trees opens a wide range of possibilities for meta-analysis and meta-mining of the data mining results.
机译:由于其直观的表示,决策树是数据挖掘中最受欢迎的模式类型之一。但是,关于决策树之间的语义相似度措施的定义,已经注意到了很少的关注。在这项工作中,我们向相似性估计提供了一般框架,其包括作为特殊情况,作为特殊情况估计决策树之间的语义相似性,以及关于在属性类空间上定义的不同概率分布的分类数据集之间的各种形式的相似性估计数据集。相似性估计基于由DIMICTIONETS在数据集的属性空间上引起的分区。我们使用所提出的框架,以估计从分类数据集的不同亚样品引起的决策树的语义相似性;我们对实证语义相似性评估其性能,我们根据独立的持有测试集估算。决策树的相似性措施的可用性为数据挖掘结果的Meta分析和Meta-Meta挖掘开辟了广泛的可能性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号