首页> 外文会议>International conference on very large data bases >PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics
【24h】

PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics

机译:预测:预测大规模迭代分析的运行时间

获取原文

摘要

Machine learning algorithms are widely used today for analytical tasks such as data cleaning, data categorization, or data filtering. At the same time, the rise of social media motivates recent uptake in large scale graph processing. Both categories of algorithms are dominated by iterative subtasks, i.e., processing steps which are executed repetitively until a convergence condition is met. Optimizing cluster resource allocations among multiple workloads of iterative algorithms motivates the need for estimating their runtime, which in turn requires: ⅰ) predicting the number of iterations, and ⅱ) predicting the processing time of each iteration. As both parameters depend on the characteristics of the dataset and on the convergence function, estimating their values before execution is difficult. This paper proposes PREDIcT, an experimental methodology for predicting the runtime of iterative algorithms. PREDIcT uses sample runs for capturing the algorithm's convergence trend and per-iteration key input features that are well correlated with the actual processing requirements of the complete input dataset. Using this combination of characteristics we predict the runtime of iterative algorithms, including algorithms with very different runtime patterns among subsequent iterations. Our experimental evaluation of multiple algorithms on scale-free graphs shows a relative prediction error of 10%-30% for predicting runtime, including algorithms with up to 100x runtime variability among consecutive iterations.
机译:机器学习算法今天广泛用于分析任务,例如数据清洁,数据分类或数据过滤。与此同时,社交媒体的兴起激励了大规模图加工的最新摄取。两种类别的算法由迭代子组织主导,即,在满足收敛条件之前重复执行的处理步骤。优化迭代算法的多个工作负载之间的群集资源分配激发了需要估计其运行时的需求,这反过来需要:Ⅰ)预测迭代的数量,Ⅱ)预测每次迭代的处理时间。由于这两个参数都取决于数据集的特征和收敛函数,在执行之前估计它们的值。本文提出了一种预测,一种用于预测迭代算法的运行时间的实验方法。预测使用示例运行来捕获算法的融合趋势和接受键输入功能,这些趋势与完整输入数据集的实际处理要求良好相关。使用这种特征的组合,我们预测迭代算法的运行时间,包括随后迭代之间具有非常不同的运行时模式的算法。我们对无规模图中的多种算法的实验评估显示了预测误差为10%-30%,用于预测运行时,包括在连续迭代中具有高达100倍的运行时变化的算法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号