首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >Performance Evaluation of Big Data Processing Strategies for Neuroimaging
【24h】

Performance Evaluation of Big Data Processing Strategies for Neuroimaging

机译:神经影像大数据处理策略的性能评估

获取原文

摘要

Neuroimaging datasets are rapidly growing in size as a result of advancements in image acquisition methods, open-science and data sharing. However, the adoption of Big Data processing strategies by neuroimaging processing engines remains limited. Here, we evaluate three Big Data processing strategies (in-memory computing, data locality and lazy evaluation) on typical neuroimaging use cases, represented by the BigBrain dataset. We contrast these various strategies using Apache Spark and Nipype as our representative Big Data and neuroimaging processing engines, on Dell EMC's Top-500 cluster. Big Data thresholds were modeled by comparing the data-write rate of the application to the filesystem bandwidth and number of concurrent processes. This model acknowledges the fact that page caching provided by the Linux kernel is critical to the performance of Big Data applications. Results show that in-memory computing alone speeds-up executions by a factor of up to 1.6, whereas when combined with data locality, this factor reaches 5.3. Lazy evaluation strategies were found to increase the likelihood of cache hits, further improving processing time. Such important speed-up values are likely to be observed on typical image processing operations performed on images of size larger than 75GB. A ballpark speculation from our model showed that in-memory computing alone will not speed-up current functional MRI analyses unless coupled with data locality and processing around 280 subjects concurrently. Furthermore, we observe that emulating in-memory computing using in-memory file systems (tmpfs) does not reach the performance of an in-memory engine, presumably due to swapping to disk and the lack of data cleanup. We conclude that Big Data processing strategies are worth developing for neuroimaging applications.
机译:由于图像采集方法,开放科学和数据共享的进步,神经影像数据集的规模正在迅速增长。但是,神经影像处理引擎对大数据处理策略的采用仍然受到限制。在这里,我们以BigBrain数据集为代表,对典型的神经影像用例评估了三种大数据处理策略(内存计算,数据局部性和惰性评估)。我们将在Dell EMC的Top-500集群上使用Apache Spark和Nipype作为我们代表的大数据和神经影像处理引擎来比较这些不同的策略。通过将应用程序的数据写入速率与文件系统带宽和并发进程数进行比较,对大数据阈值进行了建模。该模型承认以下事实:Linux内核提供的页面缓存对于大数据应用程序的性能至关重要。结果表明,仅内存计算将执行速度提高了1.6倍,而与数据局部性结合使用时,该系数达到了5.3。发现惰性评估策略增加了缓存命中的可能性,从而进一步缩短了处理时间。在大小大于75GB的图像上执行的典型图像处理操作中可能会观察到如此重要的加速值。我们模型的一个推测表明,除非结合数据局部性和同时处理约280个受试者,否则仅凭内存计算就无法加快当前的功能性MRI分析的速度。此外,我们观察到使用内存文件系统(tmpfs)模拟内存计算无法达到内存引擎的性能,这大概是由于交换到磁盘和缺乏数据清理的缘故。我们得出结论,对于神经影像应用而言,大数据处理策略值得开发。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号