首页> 外文期刊>Journal of Computer and System Sciences >Mining Optimized Association Rules for Numeric Attributes
【24h】

Mining Optimized Association Rules for Numeric Attributes

机译:挖掘数字属性的优化关联规则

获取原文
获取原文并翻译 | 示例
       

摘要

Given a huge database, we address the problem of finding associa- tion rules for numeric attributes, such as (Balance ∈ 1 ) => (CardLoan = yes ), which implies that bank customers whose balances fall in a range / are likely to use card loan with a probability greater than p. The above rule is interesting only if the range l has some special feature with respect to the interrelation between Balance and CardLoan. It is required that the number of customers whose balances are contained in / (called the support of l ) is sufficient and also that the probability p of the condition CardLoan = yes being met (called the confidence ratio) be much higher than the average probability of the condition over all the data. Our goal is to realize a system that finds such appropriate ranges automatically. We mainly focus on computing two optimized ranges: one that maximizes the support on the condition that the confidence ratio is at least a given threshold value, and another that maximizes the confidence ratio on the condition that the support is at least a given threshold number. Using techniques from computational geometry, we present novel algorithms that compute the optimized ranges in linear time if the data are sorted. Since sorting data with respect to each numeric attribute is expensive in the case of huge databases that occupy much more space than the main memory, we instead apply randomized bucketing as the preprocessing method and thus obtain an efficient rule-finding system. Tests show that our implementation is fast not only in theory but also in practice. The efficiency of our algorithm enables us to compute optimized rules for all combinations of hundreds of numeric and Boolean attributes in a reasonable time.
机译:给定一个庞大的数据库,我们要解决的问题是为数字属性找到关联规则,例如(Balance∈1)=>(CardLoan = yes),这意味着余额在/范围内的银行客户可能会使用卡贷款的概率大于p。仅当范围l在Balance和CardLoan之间的相互关系方面具有某些特殊功能时,上述规则才有意义。要求余额中包含/的客户数量足够(称为l的支持),并且条件CardLoan = yes满足的概率p(称为置信度)必须远高于平均概率所有数据的状况。我们的目标是实现一个能够自动找到合适范围的系统。我们主要着重于计算两个优化范围:一个在置信度至少为给定阈值的条件下最大化支持,另一个在支持度至少为给定阈值数的条件下最大化支持率。使用来自计算几何的技术,我们提出了新颖的算法,如果对数据进行了排序,它们可以在线性时间内计算出最佳范围。在大型数据库比主内存占用更多​​空间的情况下,由于针对每个数字属性对数据进行排序的成本很高,因此,我们将随机化存储分区用作预处理方法,从而获得了有效的规则查找系统。测试表明,我们的实施不仅在理论上而且在实践上都很快。我们算法的效率使我们能够在合理的时间内为数百个数字和布尔属性的所有组合计算优化规则。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号