首页> 外文期刊>Journal of Parallel and Distributed Computing >Distributed stream clustering using micro-clusters on Apache Storm
【24h】

Distributed stream clustering using micro-clusters on Apache Storm

机译:在Apache Storm上使用微集群进行分布式流集群

获取原文
获取原文并翻译 | 示例

摘要

The recent need to extract real-time insights from data has driven the need for machine learning algorithms that can operate on data streams. Given the current extreme rates of data generation (around 5000 messages per second), these algorithms need to be able to handle data streams of very high velocity. Many current algorithms do not reach this requirement, in some cases processing only tens of messages per second. In this work we address the problem of limited achievable throughput of stream clustering by developing scalable distributed algorithms based on the micro-clustering paradigm that run on cloud platforms. We present two distributed architectures to execute the algorithms in parallel and implement these architectures on the Apache Storm stream processing platform. We demonstrate that we are able to gain close to an order of magnitude of improvement of performance in our experiments.
机译:从数据中提取实时洞察力的最新需求推动了对可以在数据流上运行的机器学习算法的需求。考虑到当前极高的数据生成速率(每秒约5000条消息),这些算法需要能够处理非常高速度的数据流。当前许多算法都不能满足此要求,在某些情况下,每秒仅处理数十条消息。在这项工作中,我们通过基于在云平台上运行的微集群范例开发可扩展的分布式算法,解决了流集群可实现的吞吐量受限的问题。我们提出了两种分布式架构,以并行执行算法,并在Apache Storm流处理平台上实现这些架构。我们证明,在我们的实验中,我们能够获得接近一个数量级的性能提升。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号