Evaluation of Apache Hadoop for parallel data analysis with ROOT

机译：评估Apache Hadoop与root并行数据分析

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The Apache Hadoop software is a Java based framework for distributed processing of large data sets across clusters of computers, using the Hadoop file system (HDFS) for data storage and backup and MapReduce as a processing platform.Hadoop is primarily designed for processing large textual data sets which can be processed in arbitrary chunks, and must be adapted to the use case of processing binary data files which cannot be split automatically.However, Hadoop offers attractive features in terms of fault tolerance, task supervision and control, multi-user functionality and job management.For this reason, we evaluated Apache Hadoop as an alternative approach to PROOF for ROOT data analysis.Two alternatives in distributing analysis data were discussed:either the data was stored in HDFS and processed with MapReduce, or the data was accessed via a standard Grid storage system (dCache Tier-2) and MapReduce was used only as execution back-end.The focus in the measurements were on the one hand to safely store analysis data on HDFS with reasonable data rates and on the other hand to process data fast and reliably with MapReduce.In the evaluation of the HDFS, read/write data rates from local Hadoop cluster have been measured and compared to standard data rates from the local NFS installation.In the evaluation of MapReduce, realistic ROOT analyses have been used and event rates have been compared to PROOF.

机译：Apache Hadoop软件是一种基于Java的基于Java，用于跨计算机集群的大数据集的分布式处理，使用Hadoop文件系统（HDFS）进行数据存储和备份和MapReduce作为处理平台.Hadoop主要用于处理大型文本数据可以在任意块中处理的集合，并且必须适用于处理不能自动分割的二进制数据文件的用例。无论如何，Hadoop就容错，任务监控和控制，多用户功能和多用户功能提供了吸引力的功能工作管理。这是因为这个原因，我们评估了Apache Hadoop作为root数据分析证明的替代方法。讨论了Wo替代方法，讨论了分发分析数据的替代方法：数据存储在HDF中并通过MapReduce进行处理，或者通过A处理数据标准电网存储系统（DCACHE TIER-2）和MAPREDUCE仅用后端使用。测量中的焦点是一方面为了安全地在具有合理数据速率的HDF上存储分析数据，另一方面，使用MapReduce快速且可靠地处理数据。在HDF的评估中，已测量本地Hadoop集群的读/写数据速率并与标准数据速率进行比较从本地NFS安装。在MapReduce的评估中，已经使用了现实的根部分析，并将事件率与证明进行了比较。

著录项

来源
《Conference on Computing in High Energy and Nuclear Physics》|2014年||共5页
会议地点
作者
S Lehrack; G Duckeck; J Ebke;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 O572.2-532;
关键词
Evaluation; Apache; parallel;

机译：评估;Apache;并行;

相似文献

外文文献
中文文献
专利

1. Performance evaluation of cloud-based log file analysis with Apache Hadoop and Apache Spark [J] . Ilias Mavridis, Helen Karatza The Journal of Systems and Software . 2017,第Mara期

机译：使用Apache Hadoop和Apache Spark进行基于云的日志文件分析的性能评估
2. Typhoon quantitative rainfall prediction from big data analytics by using the apache hadoop spark parallel computing framework [J] . C- C. Wei, T.- H. Chou Oceanographic Literature Review . 2020,第10期

机译：台风通过使用Apache Hadoop火花并行计算框架来从大数据分析的量化降雨预测
3. Hadoop 2 quick-start guide: learn the essentials of big data computing in the Apache Hadoop 2 ecosystem [J] . A. Squassabia Computing reviews . 2016,第6期

机译：Hadoop 2快速入门指南：了解Apache Hadoop 2生态系统中大数据计算的基本知识
4. Evaluation of Apache Hadoop for parallel data analysis with ROOT [C] . S Lehrack, G Duckeck, J Ebke Conference on Computing in High Energy and Nuclear Physics . 2014

机译：评估Apache Hadoop与root并行数据分析
5. Sentiment analysis of big social data with Apache Hadoop. [D] . Kang, Qiuling. 2014

机译：使用Apache Hadoop对大型社交数据进行情感分析。
6. Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine [O] . Shunxing Bao, Frederick D. Weitendorf, Andrew J. Plassard, -1

机译：使用Apache Hadoop和Sun Grid Engine进行大数据图像处理的理论和经验比较
7. A Comprehensive Performance Analysis of Apache Hadoop and Apache Spark for Large Scale Data Sets Using HiBench [O] . Nasim Ahmed, Andre L. C. Barczak, Teo Susnjak, 2020

机译：使用Hibench的大规模数据集的Apache Hadoop和Apache Spark的全面绩效分析

Evaluation of Apache Hadoop for parallel data analysis with ROOT

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅