Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine

机译：Apache Hadoop和Sun GRID引擎大数据图像处理的理论与实证比较

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The field of big data is generally concerned with the scale of processing at which traditional computational paradigms break down. In medical imaging, traditional large scale processing uses a cluster computer that combines a group of workstation nodes into a functional unit that is controlled by a job scheduler. Typically, a shared-storage network file system (NFS) is used to host imaging data. However, data transfer from storage to processing nodes can saturate network bandwidth when data is frequently uploaded/retrieved from the NFS, e.g., "short" processing times and/or "large" datasets. Recently, an alternative approach using Hadoop and HBase was presented for medical imaging to enable co-location of data storage and computation while minimizing data transfer. The benefits of using such a framework must be formally evaluated against a traditional approach to characterize the point at which simply "large scale" processing transitions into "big data" and necessitates alternative computational frameworks. The proposed Hadoop system was implemented on a production lab-cluster alongside a standard Sun Grid Engine (SGE). Theoretical models for wall-clock time and resource time for both approaches are introduced and validated. To provide real example data, three T1 image archives were retrieved from a university secure, shared web database and used to empirically assess computational performance under three configurations of cluster hardware (using 72, 109, or 209 CPU cores) with differing job lengths. Empirical results match the theoretical models. Based on these data, a comparative analysis is presented for when the Hadoop framework will be relevant and non-relevant for medical imaging.

机译：大数据领域通常关注传统计算范例分解的处理规模。在医学成像中，传统的大规模处理使用集群计算机将一组工作站节点组合成由作业调度程序控制的功能单元。通常，共享存储网络文件系统（NFS）用于主持成像数据。然而，当从NFS，例如“短”处理时间和/或“大”数据集经常上载/检索数据时，从存储到处理节点的数据传输可以使网络带宽饱和。最近，呈现了使用Hadoop和HBase的替代方法，用于医学成像，以使数据存储和计算的共同定位，同时最小化数据传输。使用这种框架的益处必须针对传统方法进行正式评估，以表征简单地将“大规模”处理转换为“大数据”的点，并且需要替代计算框架。建议的Hadoop系统在生产实验室集群和标准的太阳网发动机（SGE）旁边实施。介绍和验证了两种方法的壁钟时间和资源时间的理论模型。为了提供真实示例数据，从大学安全，共享Web数据库中检索了三个T1图像归档，并用于经验在群集硬件（使用72,109或209 CPU核心）下的三个配置中评估具有不同作业长度的配置性能。经验结果与理论模型相匹配。基于这些数据，当Hadoop框架与医学成像相关和非相关时，提出了比较分析。

著录项

来源
《Conference on Imaging Informatics for Healthcare, Research, and Applications》|2017年|v. p.|共8页
会议地点
作者
Shunxing Bao; Frederick D. Weitendorf; Andrew J. Plassard; Yuankai Huo; Aniruddha Gokhale; Bennett A. Landman;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类医用物理学;
关键词
Apache Hadoop; Sun Grid Engine; Verification;

机译：Apache Hadoop;太阳网发动机;验证;

相似文献

外文文献
中文文献
专利

1. Apache Hadoop YARN: moving beyond MapReduce and batch processing with Apache Hadoop 2 [J] . Aake Edlund Computing reviews . 2015,第8期

机译：Apache Hadoop YARN：超越MapReduce并使用Apache Hadoop 2进行批处理
2. Processing Big Data with Apache Hadoop in the Current Challenging Era of COVID-19 [J] . Azeroual Otmane, Fabre Renaud Big Data and Cognitive Computing . 2021,第1期

机译：在Covid-19的当前充满挑战时代的Apache Hadoop处理大数据
3. Apache Spark and Hadoop Based Big Data Processing System for Clinical Research [J] . Sreekanth Rallapalli, Gondkar R. R. International Journal of Applied Engineering Research . 2018,第10aPta2期

机译：基于Apache Spark和Hadoop的临床研究大数据处理系统
4. Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine [C] . Shunxing Bao, Frederick D. Weitendorf, Andrew J. Plassard, Medical Imaging 2017 : Imaging Informatics for Healthcare, Research, and Applications . 2017

机译：使用Apache Hadoop和Sun Grid Engine进行大数据图像处理的理论和经验比较
5. Deep Data Locality on Apache Hadoop [D] . Lee, Sungchul. 2018

机译：Apache Hadoop上的深度数据本地化
6. Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine [O] . Shunxing Bao, Frederick D. Weitendorf, Andrew J. Plassard, -1

机译：使用Apache Hadoop和Sun Grid Engine进行大数据图像处理的理论和经验比较
7. A Big Data Framework for Satellite Images Processing using Apache Hadoop and RasterFrames: A Case Study of Surface Water Extraction in Phu Tho, Viet Nam [O] . Dung Nguyen, Hong Anh 2020

机译：使用Apache Hadoop和Rasterframe的卫星图像处理的大数据框架：Phu Tho，Viet Nam的地表水提取案例

Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine

摘要

著录项

相似文献

相关主题

期刊订阅