Mathematical Models on the Hadoop Runtimes on Big Data

机译：大数据Hadoop运行时的数学模型

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The problem of understanding runtime on big data processing has become key to solving the ever increasing volumes of data generated on machines. Nowadays big data is accessed through a searching system called Hadoop which uses the MapReduce algorithm. The effect of increasing machine clusters through which data is processed, the effect of machine failures on steady runtime, the effect of optimising runtime and machine cluster on the workflow process is analysed. The case in which the runtime and hours of data being processed differ is considered and the effect of the accumulation of data on runtime is analysed in detail. Mathematical models to analyse runtimes are proposed. The mathematical models proposed are borrowed from systems that process data in parallel processes. A simple runtime formula is adopted and numerical method is used to predict runtimes in the case where data is allowed to accumulate. Increasing the machine cluster reduce processing time. Increasing the overhead result in the increase in runtimes, A 15% machine failure result in the 261% increase on runtimes. The time to process one hour of data should be kept small. If one hour of data is processed in more than one hour the Hadoop system significantly slows down.

机译：了解大数据处理的运行时问题已成为解决机器上不断增长的数据量的关键。如今，大数据可通过称为Hadoop的搜索系统访问，该系统使用MapReduce算法。分析了通过增加机器集群来处理数据的影响，机器故障对稳定运行时的影响，优化运行时和机器集群对工作流程过程的影响。考虑了运行时间和数据处理时间不同的情况，并详细分析了数据累积对运行时间的影响。提出了用于分析运行时的数学模型。提出的数学模型是从在并行过程中处理数据的系统中借用的。在允许数据累积的情况下，采用一个简单的运行时公式，并使用数值方法来预测运行时。增加机器集群可以减少处理时间。开销的增加导致运行时间的增加，15％的机器故障导致运行时间增加261％。处理一小时数据的时间应保持较小。如果在一小时以上的时间内处理了一个小时的数据，则Hadoop系统的速度将大大降低。

著录项

来源
《2018 International Conference on Advances in Big Data, Computing and Data Communication Systems》|2018年|1-5|共5页
会议地点 Durban(ZA)
作者
Gilbert Makanda;
展开▼
作者单位

Department of Mathematical and Physical Sciences, Central University of Technology, Bloemfontein, South Africa;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
Runtime; Big Data; Mathematical model; Data models; Software; Data mining; Clustering algorithms;

机译：运行时;大数据;数学模型;数据模型;软件;数据挖掘;聚类算法;;

相似文献

外文文献
中文文献
专利

1. A Mathematical Model to Calculate Data Sensitivity in Hadoop Platform Using the Analytic Hierarchy Process Method [J] . Hafsa Ait idar, Hicham Belhadaoui, Reda Filali IAENG Internaitonal journal of computer science . 2020,第4PTa2期

机译：使用分析层次处理方法计算Hadoop平台数据敏感性的数学模型
2. Optimization of the Size of Thread Pool in Runtime Systems to Enterprise Application Integration: A Mathematical Modelling Approach [J] . D.L.?FREIRE, R.Z.?FRANTZ, F.?ROOS-FRANTZ, TEMA (So Carlos) . 2019,第1期

机译：运行时系统中线程池大小的优化，以企业应用程序集成：数学建模方法
3. Haery: A Hadoop Based Query System on Accumulative and High-Dimensional Data Model for Big Data [J] . Song Jie, He HongYan, Thomas Richard, IEEE Transactions on Knowledge and Data Engineering . 2020,第7期

机译：HAERY：大数据累积和高维数据模型的基于Hadoop的查询系统
4. Mathematical Models on the Hadoop Runtimes on Big Data [C] . Gilbert Makanda International Conference on Advances in Big Data, Computing and Data Communication Systems . 2018

机译：大数据上Hadoop运行时的数学模型
5. Runtime Monitoring of Cyber-Physical Systems Using Data-driven Models [D] . Calvi, Michele Giovanni. 2019

机译：使用数据驱动模型运行时监视网络物理系统
6. Accurate state estimation from uncertain data and models: an application of data assimilation to mathematical models of human brain tumors [O] . Eric J Kostelich, Yang Kuang, Joshua M McDaniel, 2011

机译：根据不确定的数据和模型进行准确的状态估计：数据同化在人脑肿瘤数学模型中的应用
7. Research on Industry Data Analysis Model Based on Hadoop Big Data Platform [O] . Xu Hongsheng, Fan Ganglong, Li Ke 2017

机译：基于Hadoop大数据平台的行业数据分析模型研究

Mathematical Models on the Hadoop Runtimes on Big Data

摘要

著录项

相似文献

相关主题

期刊订阅