【24h】

Distributed Data Mining in a Chain Store Database of Short Transactions

机译:短交易链存储数据库中的分布式数据挖掘

获取原文

摘要

In this paper, we broaden the horizon of traditional rule mining by introducing a new framework of causality rule mining in a distributed chain store database. Specifically, the causality rule explored in this paper consists of a sequence of triggering events and a set of consequential events, and is designed with the capability of mining non-sequential, inter-transaction information. Hence, the causality rule mining provides a very general framework for rule derivation. Note, however, that the procedure of causality rule mining is very costly particularly in the presence of a huge number of candidate sets and a distributed database, and in our opinion, cannot be dealt with by direct extensions from existing rule mining methods. Consequently, we devise in this paper a series of level matching algorithms, including Level Matching (abbreviatedly as LM), Level Matching with Selective Scan (abbreviatedly as LMS), and Distributed Level Matching (abbreviatedly as Distibuted LM), to minimize the computing cost needed for the distributed data mining of causality rules. In addition, the phenomena of time window constraints are also taken into consideration for the development of our algorithms. As a result of properly employing the technologies of level matching and selective scan, the proposed algorithms present good efficiency and scalability in the mining of local and global causality rules. Scale-up experiments show that the proposed algorithms scale well with the number of sites and the number of customer transactions.
机译:在本文中,我们通过在分布式连锁店数据库中引入因果规则挖掘的新框架来拓宽传统规则挖掘的视野。具体来说,本文探讨的因果关系规则由一系列触发事件和一系列后果事件组成,并被设计为具有挖掘非顺序交易信息的能力。因此,因果规则挖掘为规则推导提供了非常通用的框架。但是请注意,因果规则挖掘的过程非常昂贵,尤其是在存在大量候选集和分布式数据库的情况下,并且我们认为,无法通过现有规则挖掘方法的直接扩展来处理。因此,我们在本文中设计了一系列的级别匹配算法,包括级别匹配(缩写为LM),带选择性扫描的级别匹配(缩写为LMS)和分布式级别匹配(缩写为Distibuted LM),以最大程度地降低计算成本因果规则的分布式数据挖掘所需。另外,在我们算法的开发中还考虑了时间窗约束现象。由于适当地采用了水平匹配和选择性扫描技术,所提出的算法在挖掘局部和全局因果关系规则方面具有良好的效率和可伸缩性。放大实验表明,所提出的算法可根据站点数量和客户交易数量很好地进行缩放。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号