首页> 外文会议>Machine learning and data mining in pattern recognition >Using Resampling Techniques for Better Quality Discretization
【24h】

Using Resampling Techniques for Better Quality Discretization

机译:使用重采样技术实现更好的质量离散化

获取原文
获取原文并翻译 | 示例

摘要

Many supervised induction algorithms require discrete data, however real data often comes in both discrete and continuous formats. Quality discretization of continuous attributes is an important problem that has effects on accuracy, complexity, variance and understandability of the induction model. Usually, discretization and other types of statistical processes are applied to subsets of the population as the entire population is practically inaccessible. For this reason we argue that the discretization performed on a sample of the population is only an estimate of the entire population. Most of the existing discretization methods, partition the attribute range into two or several intervals using a single or a set of cut points. In this paper, we introduce two variants of a resampling technique (such as bootstrap) to generate a set of candidate discretization points and thus, improving the discretization quality by providing a better estimation towards the entire population. Thus, the goal of this paper is to observe whether this type of resampling can lead to better quality discretization points, which opens up a new paradigm to construction of soft decision trees.
机译:许多监督归纳算法需要离散数据,但是实际数据通常以离散和连续格式出现。连续属性的质量离散化是一个重要问题,它会影响归纳模型的准确性,复杂性,方差和易懂性。通常,离散化和其他类型的统计过程会应用于总体的子集,因为实际上几乎无法访问整个总体。因此,我们认为对总体样本进行离散化只是对整个总体的估计。现有的大多数离散化方法都使用一个或一组切点将属性范围划分为两个或几个间隔。在本文中,我们介绍了重采样技术的两种变体(例如引导程序),以生成一组候选离散点,从而通过对整个总体进行更好的估计来提高离散质量。因此,本文的目的是观察这种重采样是否可以导致更好的质量离散点,从而为构建软决策树开辟了新的范例。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号