首页> 外文期刊>IEEE Transactions on Knowledge and Data Engineering >Automating Characterization Deployment in Distributed Data Stream Management Systems
【24h】

Automating Characterization Deployment in Distributed Data Stream Management Systems

机译:分布式数据流管理系统中的自动化特征部署

获取原文
获取原文并翻译 | 示例

摘要

Distributed data stream management systems (DDSMS) are usually composed of upper layer relational query systems (RQS) and lower layer stream processing systems (SPS). When users submit new queries to RQS, a query planner needs to be converted into a directed acyclic graph (DAG) consisting of tasks which are running on SPS. Based on different query requests and data stream properties, SPS need to configure different deployments strategies. However, how to dynamically predict deployment configurations of SPS to ensure the processing throughput and low resource usage is a great challenge. This article presents OrientStream, a framework for automating characterization deployment in DDSMS using incremental machine learning techniques. By introducing the data-level, query plan-level, operator-level, and cluster-level's four-level feature extraction mechanism, we first use the different query workloads as training sets to predict the resource usage by DDSMS, and select the optimal resource configuration from candidate settings based on the current query requests and stream properties, then migrate the operator state by introducing dynamic reconfiguration. Finally, we validate our approach on the open source SPS-Storm. In view of the application scenarios with long monitoring cycle and non-frequent data fluctuation, experiments show that OrientStream can reduce CPU usage of 8-15 percent and memory usage of 38-48 percent, respectively.
机译:分布式数据流管理系统(DDSMS)通常由上层关系查询系统(RQS)和下层流处理系统(SPS)组成。当用户向RQS提交新查询时,需要将查询计划器转换为包含在SPS上运行的任务的有向无环图(DAG)。根据不同的查询请求和数据流属性,SPS需要配置不同的部署策略。但是,如何动态预测SPS的部署配置以确保处理吞吐量和低资源使用率是一个巨大的挑战。本文介绍了OrientStream,这是一个使用增量机器学习技术在DDSMS中自动进行特征部署的框架。通过引入数据级,查询计划级,操作员级和集群级的四级特征提取机制,我们首先使用不同的查询工作负载作为训练集来预测DDSMS的资源使用情况,然后选择最佳资源根据当前查询请求和流属性从候选设置中进行配置,然后通过引入动态重新配置来迁移操作员状态。最后,我们在开源SPS-Storm上验证我们的方法。针对监控周期长,数据波动不频繁的应用场景,实验表明OrientStream可以分别将CPU使用率降低8-15%,将内存使用率降低38-48%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号