首页> 外文会议>Conference on Imaging Informatics for Healthcare, Research, and Applications >Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine
【24h】

Theoretical and Empirical Comparison of Big Data Image Processing with Apache Hadoop and Sun Grid Engine

机译:Apache Hadoop和Sun GRID引擎大数据图像处理的理论与实证比较

获取原文

摘要

The field of big data is generally concerned with the scale of processing at which traditional computational paradigms break down. In medical imaging, traditional large scale processing uses a cluster computer that combines a group of workstation nodes into a functional unit that is controlled by a job scheduler. Typically, a shared-storage network file system (NFS) is used to host imaging data. However, data transfer from storage to processing nodes can saturate network bandwidth when data is frequently uploaded/retrieved from the NFS, e.g., "short" processing times and/or "large" datasets. Recently, an alternative approach using Hadoop and HBase was presented for medical imaging to enable co-location of data storage and computation while minimizing data transfer. The benefits of using such a framework must be formally evaluated against a traditional approach to characterize the point at which simply "large scale" processing transitions into "big data" and necessitates alternative computational frameworks. The proposed Hadoop system was implemented on a production lab-cluster alongside a standard Sun Grid Engine (SGE). Theoretical models for wall-clock time and resource time for both approaches are introduced and validated. To provide real example data, three T1 image archives were retrieved from a university secure, shared web database and used to empirically assess computational performance under three configurations of cluster hardware (using 72, 109, or 209 CPU cores) with differing job lengths. Empirical results match the theoretical models. Based on these data, a comparative analysis is presented for when the Hadoop framework will be relevant and non-relevant for medical imaging.
机译:大数据领域通常关注传统计算范例分解的处理规模。在医学成像中,传统的大规模处理使用集群计算机将一组工作站节点组合成由作业调度程序控制的功能单元。通常,共享存储网络文件系统(NFS)用于主持成像数据。然而,当从NFS,例如“短”处理时间和/或“大”数据集经常上载/检索数据时,从存储到处理节点的数据传输可以使网络带宽饱和。最近,呈现了使用Hadoop和HBase的替代方法,用于医学成像,以使数据存储和计算的共同定位,同时最小化数据传输。使用这种框架的益处必须针对传统方法进行正式评估,以表征简单地将“大规模”处理转换为“大数据”的点,并且需要替代计算框架。建议的Hadoop系统在生产实验室集群和标准的太阳网发动机(SGE)旁边实施。介绍和验证了两种方法的壁钟时间和资源时间的理论模型。为了提供真实示例数据,从大学安全,共享Web数据库中检索了三个T1图像归档,并用于经验在群集硬件(使用72,109或209 CPU核心)下的三个配置中评估具有不同作业长度的配置性能。经验结果与理论模型相匹配。基于这些数据,当Hadoop框架与医学成像相关和非相关时,提出了比较分析。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号