首页> 外文会议>IEEE International Congress on Big Data >Distributed Adaptive Model Rules for mining big data streams
【24h】

Distributed Adaptive Model Rules for mining big data streams

机译:用于挖掘大数据流的分布式自适应模型规则

获取原文

摘要

Decision rules are among the most expressive data mining models. We propose the first distributed streaming algorithm to learn decision rules for regression tasks. The algorithm is available in SAMOA (Scalable Advanced Massive Online Analysis), an open-source platform for mining big data streams. It uses a hybrid of vertical and horizontal parallelism to distribute Adaptive Model Rules (AMRules) on a cluster. The decision rules built by AMRules are comprehensible models, where the antecedent of a rule is a conjunction of conditions on the attribute values, and the consequent is a linear combination of the attributes. Our evaluation shows that this implementation is scalable in relation to CPU and memory consumption. On a small commodity Samza cluster of 9 nodes, it can handle a rate of more than 30000 instances per second, and achieve a speedup of up to 4.7x over the sequential version.
机译:决策规则是最具表现力的数据挖掘模型之一。我们提出了第一种分布式流算法来学习回归任务的决策规则。该算法在SAMOA(可扩展高级大规模在线分析)中可用,SAMOA是用于挖掘大数据流的开源平台。它使用垂直和水平并行度的混合来在群集上分布自适应模型规则(AMRules)。 AMRules构建的决策规则是可理解的模型,其中规则的前提是属性值上条件的结合,因此是属性的线性组合。我们的评估表明,该实现相对于CPU和内存消耗具有可伸缩性。在一个由9个节点组成的小型商品Samza集群上,它每秒可以处理30000个实例以上的速率,并且与顺序版本相比,最高可加快4.7倍的速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号