首页> 外文期刊>Cybernetics and information technologies: CIT >Performance Optimization System for Hadoop and Spark Frameworks
【24h】

Performance Optimization System for Hadoop and Spark Frameworks

机译:Hadoop和Spark框架的性能优化系统

获取原文
           

摘要

The optimization of large-scale data sets depends on the technologies andmethods used. The MapReduce model, implemented on Apache Hadoop or Spark,allows splitting large data sets into a set of blocks distributed on several machines.Data compression reduces data size and transfer time between disks and memory butrequires additional processing. Therefore, finding an optimal tradeoff is a challenge,as a high compression factor may underload Input/Output but overload theprocessor. The paper aims to present a system enabling the selection of thecompression tools and tuning the compression factor to reach the best performancein Apache Hadoop and Spark infrastructures based on simulation analyzes.
机译:大规模数据集的优化取决于所使用的技术和方法。在Apache Hadoop或Spark上实现的MapReduce模型允许将大数据集分成分布在多个机器上的一组块中.Data压缩降低了磁盘和内存之间的数据大小和传输时间,这是额外的处理。因此,找到最佳权衡是一个挑战,因为高压缩因子可能欠输入/输出而不是过载处理器。本文旨在提出一个系统,可以选择要选择的压缩工具并调整压缩因子,以基于仿真分析来达到最佳性能Apache Hadoop和Spark基础架构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号