PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics

机译：预测：预测大规模迭代分析的运行时间

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Machine learning algorithms are widely used today for analytical tasks such as data cleaning, data categorization, or data filtering. At the same time, the rise of social media motivates recent uptake in large scale graph processing. Both categories of algorithms are dominated by iterative subtasks, i.e., processing steps which are executed repetitively until a convergence condition is met. Optimizing cluster resource allocations among multiple workloads of iterative algorithms motivates the need for estimating their runtime, which in turn requires: ⅰ) predicting the number of iterations, and ⅱ) predicting the processing time of each iteration. As both parameters depend on the characteristics of the dataset and on the convergence function, estimating their values before execution is difficult. This paper proposes PREDIcT, an experimental methodology for predicting the runtime of iterative algorithms. PREDIcT uses sample runs for capturing the algorithm's convergence trend and per-iteration key input features that are well correlated with the actual processing requirements of the complete input dataset. Using this combination of characteristics we predict the runtime of iterative algorithms, including algorithms with very different runtime patterns among subsequent iterations. Our experimental evaluation of multiple algorithms on scale-free graphs shows a relative prediction error of 10%-30% for predicting runtime, including algorithms with up to 100x runtime variability among consecutive iterations.

机译：机器学习算法今天广泛用于分析任务，例如数据清洁，数据分类或数据过滤。与此同时，社交媒体的兴起激励了大规模图加工的最新摄取。两种类别的算法由迭代子组织主导，即，在满足收敛条件之前重复执行的处理步骤。优化迭代算法的多个工作负载之间的群集资源分配激发了需要估计其运行时的需求，这反过来需要：Ⅰ）预测迭代的数量，Ⅱ）预测每次迭代的处理时间。由于这两个参数都取决于数据集的特征和收敛函数，在执行之前估计它们的值。本文提出了一种预测，一种用于预测迭代算法的运行时间的实验方法。预测使用示例运行来捕获算法的融合趋势和接受键输入功能，这些趋势与完整输入数据集的实际处理要求良好相关。使用这种特征的组合，我们预测迭代算法的运行时间，包括随后迭代之间具有非常不同的运行时模式的算法。我们对无规模图中的多种算法的实验评估显示了预测误差为10％-30％，用于预测运行时，包括在连续迭代中具有高达100倍的运行时变化的算法。

著录项

来源
《International conference on very large data bases》|2013年||共12页
会议地点
作者
Adrian Daniel Popescu; Andrey Balmin; Vuk Ercegovac; Anastasia Ailamaki;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP311.13;
关键词

相似文献

外文文献
中文文献
专利

1. A new analytical scaling for turbulent wind-bent plumes: Comparison of scaling laws with analog experiments and a new database of eruptive conditions for predicting the height of volcanic plumes [J] . Aubry Thomas J., Jellinek A. Mark, Carazzo Guillaume, Journal of Volcanology and Geothermal Research . 2017,第sepa1期

机译：一种新的湍流弯曲羽流的解析尺度：尺度定律与模拟实验的比较以及一个新的火山爆发条件数据库，用于预测火山羽的高度
2. Predictive Analytics erfolgreich implementieren Fallstudienbasierte Ableitung von Erfolgsfaktoren fur die Nutzung von Predictive Analytics [J] . Daniel Schlatter, Christopher Stoll, Klaus Moller Controlling . 2020,第1期

机译：预测分析成功实施了基于案例研究的成功因素的推导，用于使用预测分析
3. NUTZUNG VON PREDICTIVE ANALYTICS IM RAHMEN DES EINSATZES VON SOFTWAREPAKETEN: Ein zielgerichtetes Zusammenspiel von Controllingsystemen und Predictive Analytics [J] . Karsten Oehler Controlling . 2020,第Suppla期

机译：使用预测分析作为软件包的一部分：控制系统和预测分析的目标相互作用
4. PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics [C] . Adrian Daniel Popescu, Andrey Balmin, Vuk Ercegovac, International conference on very large data bases . 2013

机译：PREDIcT：预测大型迭代分析的运行时间
5. Investigating Scale Effects on Analytical Methods of Predicting Peak Wind Loads on Buildings [D] . Moravej, Mohammadtaghi. 2018

机译：研究尺度效应对预测建筑物峰值风荷载分析方法的影响
6. Runtime and aPTT predict venous thrombosis and thromboembolism in patients on extracorporeal membrane oxygenation: a retrospective analysis [O] . Franziska C. Trudzinski, Peter Minko, Daniel Rapp, 2016

机译：运行时间和aPTT预测体外膜氧合患者的静脉血栓形成和血栓栓塞：一项回顾性分析
7. ADEPT Runtime/Scalability Predictor in support of Adaptive Scheduling [O] . Deshmeh Gholamhossein 2013

机译：支持自适应调度的ADEPT运行时/可扩展性预测器

PREDIcT: Towards Predicting the Runtime of Large Scale Iterative Analytics

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅