Will They Blend?: Exploring Big Data Computation Atop Traditional HPC NAS Storage

机译：它们会融合吗？：在传统的HPC NAS存储上探索大数据计算

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Apache Hadoop framework has rung in a new era in how data-rich organizations can process, store, and analyze large amounts of data. This has resulted in increased potential for an infrastructure exodus from the traditional solution of commercial database ad-hoc analytics on network-attached storage (NAS). While many data-rich organizations can afford to either move entirely to Hadoop for their Big Data analytics, or to maintain their existing traditional infrastructures and acquire a new set of infrastructure solely for Hadoop jobs, most supercomputing centers do not enjoy either of those possibilities. Too much of the existing scientific code is tailored to work on massively parallel file systems unlike the Hadoop Distributed File System (HDFS), and their datasets are too large to reasonably maintain and/or ferry between two distinct storage systems. Nevertheless, as scientists search for easier-to-program frameworks with a lower time-to-science to post-process their huge datasets after execution, there is increasing pressure to enable use of MapReduce within these traditional High Performance Computing (HPC) architectures. Therefore, in this work we explore potential means to enable use of the easy-to-program Hadoop MapReduce framework without requiring a complete infrastructure overhaul from existing HPC NAS solutions. We demonstrate that retaining function-dedicated resources like NAS is not only possible, but can even be effected efficiently with MapReduce. In our exploration, we unearth subtle pitfalls resultant from this mash-up of new-era Big Data computation on conventional HPC storage and share the clever architectural configurations that allow us to avoid them. Last, we design and present a novel Hadoop File System, the Reliable Array of Independent NAS File System (RainFS), and experimentally demonstrate its improvements in performance and reliability over the previous architectures we have investigated.

机译：在数据丰富的组织如何处理，存储和分析大量数据的过程中，Apache Hadoop框架进入了一个新时代。这导致基础设施外流的潜力从传统的基于网络附加存储（NAS）的商业数据库临时分析解决方案中流失。尽管许多数据丰富的组织有能力要么完全迁移到Hadoop进行大数据分析，要么维护他们现有的传统基础架构并购买一套专门用于Hadoop工作的新基础架构，但是大多数超级计算中心都不享受这些可能性。与Hadoop分布式文件系统（HDFS）不同，太多的现有科学代码是为在大规模并行文件系统上工作而量身定制的，并且它们的数据集太大，无法在两个不同的存储系统之间进行合理的维护和/或传递。然而，随着科学家们在寻找易于编程的，科学时间较短的框架以在执行后对其庞大的数据集进行后处理时，在这些传统的高性能计算（HPC）架构中启用MapReduce的压力越来越大。因此，在这项工作中，我们探索了潜在的手段，使人们能够使用易于编程的Hadoop MapReduce框架，而无需对现有HPC NAS解决方案进行全面的基础架构检修。我们证明，保留功能专用资源（如NAS）不仅是可能的，而且甚至可以通过MapReduce有效地实现。在我们的探索中，我们发掘了传统HPC存储上新时代大数据计算的这种混搭所产生的细微陷阱，并分享了使我们能够避免它们的聪明的架构配置。最后，我们设计并提出了一种新颖的Hadoop文件系统，即独立NAS文件系统的可靠阵列（RainFS），并通过实验证明了其在性能和可靠性方面比我们之前研究的体系结构有所提高。

著录项

来源
《IEEE international conference on distributed computing systemss》|2014年|524-534|共11页
会议地点
作者
Wilson Ellis H.; Kandemir Mahmut T.; Gibson Garth;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
File Systems; MapReduce; Network Attached Storage;

机译：文件系统; MapReduce;网络附加存储;

相似文献

外文文献
中文文献
专利

1. Computational storage: an efficient and scalable platform for big data and HPC applications [J] . Mahdi Torabzadehkashi, Siavash Rezaei, Ali HeydariGorji, Journal of Big Data . 2019,第1期

机译：计算存储：适用于大数据和HPC应用程序的高效且可扩展的平台
2. Dynamic Data Management Among Multiple Databases for Optimization of Parallel Computations in Heterogeneous HPC Systems [J] . Pawe? Rosciszewski Computer Science & Information Technology . 2014,第7b期

机译：多个数据库之间的动态数据管理，以优化异构HPC系统中的并行计算
3. DDS:A deadlock detection-based scheduling algorithm for workflow computations in HPC systems with storage constraints [J] . Yang Wang, Paul Lu Parallel Computing . 2013,第8期

机译：DDS：基于死锁检测的调度算法，用于具有存储约束的HPC系统中的工作流计算
4. Will They Blend?: Exploring Big Data Computation Atop Traditional HPC NAS Storage [C] . Wilson Ellis H., Kandemir Mahmut T., Gibson Garth IEEE international conference on distributed computing systemss . 2014

机译：他们会混合吗？：探索传统的HPC NAS存储顶部的大数据计算
5. Efficient Data Reduction in HPC and Distributed Storage Systems [D] . Liu, Tong. 2021

机译：高效数据减少HPC和分布式存储系统
6. Exploring the Irish National Folklore Ethnography Database (Dúchas) for Open Data Research on Traditional Medicine Use in Post-Famine Ireland: An Early Example of Citizen Science [O] . Aaron Koay, Fiona Shannon, Astrid Sasse, 2020

机译：探索爱尔兰民族民俗民族遗艺信息库（DúChas）对饥荒后的传统医学用途的开放数据研究：公民科学的早期举例
7. Will They Blend?: Exploring Big Data Computation atop Traditional HPC NAS Storage [O] . Ellis H. Wilson Iii, Mahmut T. K, Garth Gibson 2015

机译：他们会混合吗？：探索传统HpC Nas存储上的大数据计算

Will They Blend?: Exploring Big Data Computation Atop Traditional HPC NAS Storage

摘要

著录项

相似文献

相关主题

期刊订阅