【24h】

Mathematical Models on the Hadoop Runtimes on Big Data

机译:大数据Hadoop运行时的数学模型

获取原文
获取原文并翻译 | 示例

摘要

The problem of understanding runtime on big data processing has become key to solving the ever increasing volumes of data generated on machines. Nowadays big data is accessed through a searching system called Hadoop which uses the MapReduce algorithm. The effect of increasing machine clusters through which data is processed, the effect of machine failures on steady runtime, the effect of optimising runtime and machine cluster on the workflow process is analysed. The case in which the runtime and hours of data being processed differ is considered and the effect of the accumulation of data on runtime is analysed in detail. Mathematical models to analyse runtimes are proposed. The mathematical models proposed are borrowed from systems that process data in parallel processes. A simple runtime formula is adopted and numerical method is used to predict runtimes in the case where data is allowed to accumulate. Increasing the machine cluster reduce processing time. Increasing the overhead result in the increase in runtimes, A 15% machine failure result in the 261% increase on runtimes. The time to process one hour of data should be kept small. If one hour of data is processed in more than one hour the Hadoop system significantly slows down.
机译:了解大数据处理的运行时问题已成为解决机器上不断增长的数据量的关键。如今,大数据可通过称为Hadoop的搜索系统访问,该系统使用MapReduce算法。分析了通过增加机器集群来处理数据的影响,机器故障对稳定运行时的影响,优化运行时和机器集群对工作流程过程的影响。考虑了运行时间和数据处理时间不同的情况,并详细分析了数据累积对运行时间的影响。提出了用于分析运行时的数学模型。提出的数学模型是从在并行过程中处理数据的系统中借用的。在允许数据累积的情况下,采用一个简单的运行时公式,并使用数值方法来预测运行时。增加机器集群可以减少处理时间。开销的增加导致运行时间的增加,15%的机器故障导致运行时间增加261%。处理一小时数据的时间应保持较小。如果在一小时以上的时间内处理了一个小时的数据,则Hadoop系统的速度将大大降低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号