...
【24h】

Scoring the data using association rules

机译:使用关联规则对数据评分

获取原文
获取原文并翻译 | 示例

摘要

In many data mining applications, the objective is to select data cases of a target class. For example, in direct marketing, marketers want to select likely buyers of a particular product for promotion. In such applications, it is often too difficult to predict who will definitely be in the target class (e.g., the buyer class) because the data used for modeling is often very noisy and has a highly imbalanced class distribution. Traditionally, classification systems are used to solve this problem. Instead of classifying each data case to a definite class (e.g., buyer or non-buyer), a classification system is modified to produce a class probability estimate (or a score) for the data case to indicate the likelihood that the data case belongs to the target class (e.g., the buyer class). However, existing classification systems only aim to find a subset of the regularities or rules that exist in data. This subset of rules only gives a partial picture of the domain. In this paper, we show that the target selection problem can be mapped to association rule mining to provide a more powerful solution to the problem. Since association rule mining aims to find all rules in data, it is thus able to give a complete picture of the underlying relationships in the domain. The complete set of rules enables us to assign a more accurate class probability estimate to each data case. This paper proposes an effective and efficient technique to compute class probability estimates using association rules. Experiment results using public domain data and real-life application data show that in general the new technique performs markedly better than the state-of-the-art classification system C4.5, boosted C4.5, and the Naive Bayesian system. [References: 35]
机译:在许多数据挖掘应用程序中,目标是选择目标类的数据案例。例如,在直接营销中,营销人员希望选择特定产品的可能购买者进行促销。在这样的应用中,通常很难预测谁肯定会属于目标类别(例如,买方类别),因为用于建模的数据通常非常嘈杂,并且类别分布高度不平衡。传统上,分类系统用于解决此问题。代替将每个数据案例分类为确定的类别(例如,购买者或非购买者),修改分类系统以产生用于数据案例的类别概率估计(或分数),以指示该数据案例所属的可能性目标类别(例如,买方类别)。但是,现有的分类系统仅旨在查找数据中存在的规则或规则的子集。规则的这个子集仅给出了部分域的情况。在本文中,我们表明可以将目标选择问题映射到关联规则挖掘中,从而为该问题提供更强大的解决方案。由于关联规则挖掘旨在查找数据中的所有规则,因此它能够提供域中基础关系的完整图片。完整的规则集使我们能够为每个数据案例分配更准确的类别概率估计。本文提出了一种有效和高效的技术来使用关联规则来计算类别概率估计。使用公共领域数据和现实生活中的应用程序数据进行的实验结果表明,总体而言,新技术的性能明显优于最新的分类系统C4.5,增强的C4.5和朴素的贝叶斯系统。 [参考:35]

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号