首页> 外文期刊>Applied Artificial Intelligence >PRE-PROCESSING OF HIGH-DIMENSIONAL CATEGORICAL PREDICTORS IN CLASSIFICATION SETTINGS
【24h】

PRE-PROCESSING OF HIGH-DIMENSIONAL CATEGORICAL PREDICTORS IN CLASSIFICATION SETTINGS

机译:分类集中的高维分类器的预处理

获取原文
获取原文并翻译 | 示例

摘要

Models in industrial applications can encounter categorical predictors with a large number of categories (hundreds or thousands). An example is the lot identifier of product in semiconductor manufacturing. Such variables represent a serious problem for practically all modern classification techniques. The goal is an efficient, computationally fast way to discover a small number of natural partitions of values for such variables that have similar statistical properties in terms of categorical response. Such partitions (interesting by itself) can be used then as an input to standard learning algorithms, such as decision trees, support vector machines, etc. The proposed approach introduces a data transformation on derived sparse frequency tables. Application of even simplest non-hierarchical metric clustering method to the transformed coordinates shows significant improvement both in speed and quality of partition in comparison to currently used methods.
机译:工业应用中的模型可能会遇到具有大量类别(数百或数千)的类别预测变量。一个例子是半导体制造中产品的批次标识符。对于几乎所有现代分类技术而言,这样的变量都代表着一个严重的问题。目标是一种有效的,计算快速的方法,以发现针对此类变量的少量自然值分区,这些变量在分类响应方面具有相似的统计属性。这样的分区(本身很有趣)随后可以用作标准学习算法(例如决策树,支持向量机等)的输入。所提出的方法在导出的稀疏频率表上引入了数据转换。与当前使用的方法相比,即使将最简单的非分层度量聚类方法应用于转换后的坐标,也显示了分区速度和分区质量的显着改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号