首页> 外文OA文献 >PerTract: Model Extraction and Specification of Big Data Systems for Performance Prediction by the Example of Apache Spark and Hadoop
【2h】

PerTract: Model Extraction and Specification of Big Data Systems for Performance Prediction by the Example of Apache Spark and Hadoop

机译:actract:Apache Spark和Hadoop示例的绩效预测模型提取和规范

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Evaluating and predicting the performance of big data applications are required to efficiently size capacities and manage operations. Gaining profound insights into the system architecture, dependencies of components, resource demands, and configurations cause difficulties to engineers. To address these challenges, this paper presents an approach to automatically extract and transform system specifications to predict the performance of applications. It consists of three components. First, a system-and tool-agnostic domain-specific language (DSL) allows the modeling of performance-relevant factors of big data applications, computing resources, and data workload. Second, DSL instances are automatically extracted from monitored measurements of Apache Spark and Apache Hadoop (i.e., YARN and HDFS) systems. Third, these instances are transformed to model- and simulation-based performance evaluation tools to allow predictions. By adapting DSL instances, our approach enables engineers to predict the performance of applications for different scenarios such as changing data input and resources. We evaluate our approach by predicting the performance of linear regression and random forest applications of the HiBench benchmark suite. Simulation results of adjusted DSL instances compared to measurement results show accurate predictions errors below 15% based upon averages for response times and resource utilization.
机译:评估和预测大数据应用的性能是有效的大小容量和管理操作。在系统架构中获得深刻的见解,组件的依赖关系,资源需求和配置会导致工程师困难。为了解决这些挑战,本文提出了一种自动提取和转换系统规范以预测应用程序性能的方法。它由三个组成部分组成。首先,系统和工具 - 不可止结域的语言(DSL)允许建模大数据应用,计算资源和数据工作负载的性能相关因素。其次,从Apache Spark和Apache Hadoop(即纱和HDFS)系统的监视测量中自动提取DSL实例。第三,这些实例被转换为基于模型和仿真的性能评估工具,以允许预测。通过调整DSL实例,我们的方法使工程师能够预测不同场景的应用程序,例如改变数据输入和资源。我们通过预测Hibench基准套件的线性回归和随机森林应用的性能来评估我们的方法。与测量结果相比调整的DSL实例的仿真结果显示,基于响应时间和资源利用率的平均值显示精确的预测误差低于15%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号