首页> 外文会议>International Symposium on Quality of Service >EXTRA: An Experience-driven Control Framework for Distributed Stream Data Processing with a Variable Number of Threads
【24h】

EXTRA: An Experience-driven Control Framework for Distributed Stream Data Processing with a Variable Number of Threads

机译:额外:具有可变数量的线程的分布式流数据处理的体验驱动控制框架

获取原文

摘要

In this paper, we present design, implementation and evaluation of a control framework, EXTRA (Experience-driven conTRol frAmework), for scheduling in general-purpose Distributed Stream Data Processing Systems (DSDPSs). Our design is novel due to the following reasons. First, EXTRA enables a DSDPS to dynamically change the number of threads on the fly according to system states and demands. Most existing methods, however, use a fixed number of threads to carry workload (for each processing unit of an application), which is specified by a user in advance and does not change during runtime. So our design introduces a whole new dimension for control in DSDPSs, which has a great potential to significantly improve system flexibility and efficiency, but makes the scheduling problem much harder. Second, EXTRA leverages an experience/data driven model-free approach for dynamic control using the emerging Deep Reinforcement Learning (DRL), which enables a DSDPS to learn the best way to control itself from its own experience just as a human learns a skill (such as driving and swimming) without any accurate and mathematically solvable model. We implemented it based on a widely-used DSDPS, Apache Storm, and evaluated its performance with three representative Stream Data Processing (SDP) applications: continuous queries, word count (stream version) and log stream processing. Particularly, we performed experiments under realistic settings (where multiple application instances are mixed up together), rather than a simplified setting (where experiments are conducted only on a single application instance) used in most related works. Extensive experimental results show: 1) Compared to Storm’s default scheduler and the state-of-the-art model-based method, EXTRA substantially reduces average end-to-end tuple processing time by 39.6% and 21.6% respectively on average. 2) EXTRA does lead to more flexible and efficient stream data processing by enabling the use of a variable number of threads. 3) EXTRA is robust in a highly dynamic environment with significant workload change.
机译:在本文中,我们对控制框架,额外(经验驱动控制框架)的设计,实施和评估,用于调度通用分布式流数据处理系统(DSDPS)。由于以下原因,我们的设计是新颖的。首先,额外使DSDP能够根据系统状态和需求动态地改变线程的线程数。然而,大多数现有方法使用固定数量的线程来携带工作负载(对于应用程序的每个处理单元),其由用户预先指定并且在运行时不改变。因此,我们的设计引入了DSDPS中控制的全新维度,这具有很大的潜力,可以显着提高系统灵活性和效率,但使调度问题更加困难。其次,额外利用了一种经验/数据驱动的无动态模型方法,用于使用新兴的深度加强学习(DRL)进行动态控制,这使得DSDPS能够以人类学习技能而从自己的体验中获取最佳方式来控制自己的最佳方式(如驾驶和游泳)没有任何准确和数学上可解决的模型。我们基于广泛使用的DSDPS,Apache Storm来实现它,并评估其具有三个代表性流数据处理(SDP)应用程序的性能:连续查询,字数(流版本)和日志流处理。特别地,我们在现实设置(多个应用程序实例混合在一起)而不是简化的设置(仅在大多数应用程序实例中进行实验)的实际设置进行实验。广泛的实验结果表明:1)与风暴的默认调度器和基于最先进的模型的方法相比,额外的基本上将平均端到端元组处理时间减少了39.6%和平均21.6%。 2)额外通过启用可变数量的线程来导致更灵活和有效的流数据处理。 3)在具有重要工作量变化的高度动态环境中,额外是强大的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号