Performance Evaluation of Big Data Processing Strategies for Neuroimaging

机译：神经影像大数据处理策略的性能评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Neuroimaging datasets are rapidly growing in size as a result of advancements in image acquisition methods, open-science and data sharing. However, the adoption of Big Data processing strategies by neuroimaging processing engines remains limited. Here, we evaluate three Big Data processing strategies (in-memory computing, data locality and lazy evaluation) on typical neuroimaging use cases, represented by the BigBrain dataset. We contrast these various strategies using Apache Spark and Nipype as our representative Big Data and neuroimaging processing engines, on Dell EMC's Top-500 cluster. Big Data thresholds were modeled by comparing the data-write rate of the application to the filesystem bandwidth and number of concurrent processes. This model acknowledges the fact that page caching provided by the Linux kernel is critical to the performance of Big Data applications. Results show that in-memory computing alone speeds-up executions by a factor of up to 1.6, whereas when combined with data locality, this factor reaches 5.3. Lazy evaluation strategies were found to increase the likelihood of cache hits, further improving processing time. Such important speed-up values are likely to be observed on typical image processing operations performed on images of size larger than 75GB. A ballpark speculation from our model showed that in-memory computing alone will not speed-up current functional MRI analyses unless coupled with data locality and processing around 280 subjects concurrently. Furthermore, we observe that emulating in-memory computing using in-memory file systems (tmpfs) does not reach the performance of an in-memory engine, presumably due to swapping to disk and the lack of data cleanup. We conclude that Big Data processing strategies are worth developing for neuroimaging applications.

机译：由于图像采集方法，开放科学和数据共享的进步，神经影像数据集的规模正在迅速增长。但是，神经影像处理引擎对大数据处理策略的采用仍然受到限制。在这里，我们以BigBrain数据集为代表，对典型的神经影像用例评估了三种大数据处理策略（内存计算，数据局部性和惰性评估）。我们将在Dell EMC的Top-500集群上使用Apache Spark和Nipype作为我们代表的大数据和神经影像处理引擎来比较这些不同的策略。通过将应用程序的数据写入速率与文件系统带宽和并发进程数进行比较，对大数据阈值进行了建模。该模型承认以下事实：Linux内核提供的页面缓存对于大数据应用程序的性能至关重要。结果表明，仅内存计算将执行速度提高了1.6倍，而与数据局部性结合使用时，该系数达到了5.3。发现惰性评估策略增加了缓存命中的可能性，从而进一步缩短了处理时间。在大小大于75GB的图像上执行的典型图像处理操作中可能会观察到如此重要的加速值。我们模型的一个推测表明，除非结合数据局部性和同时处理约280个受试者，否则仅凭内存计算就无法加快当前的功能性MRI分析的速度。此外，我们观察到使用内存文件系统（tmpfs）模拟内存计算无法达到内存引擎的性能，这大概是由于交换到磁盘和缺乏数据清理的缘故。我们得出结论，对于神经影像应用而言，大数据处理策略值得开发。

著录项

来源
《IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing》|2019年|449-458|共10页
会议地点
作者
Valerie Hayot-Sasson; Shawn T Brown; Tristan Glatard;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Big Data; biomedical MRI; brain; cache storage; data analysis; Linux; medical image processing; neurophysiology; paged storage; storage management;

机译：大数据;生物医学MRI;大脑;缓存存储;数据分析; Linux;医学图像处理;神经生理学;分页存储;存储管理;

相似文献

外文文献
中文文献
专利

1. Syndromic surveillance using veterinary laboratory data: data pre-processing and algorithm performance evaluation [J] . Fernanda C. Dorea, Beverly J. McEwen, W. Bruce McNab Journal of the Royal Society Interface . 2013,第83期

机译：使用兽医实验室数据进行症状监测：数据预处理和算法性能评估
2. Quaternionic Signal Processing Techniques for Automatic Evaluation of Dance Performances From MoCap Data [J] . Alexiadis D.S., Daras P. Multimedia, IEEE Transactions on . 2014,第5期

机译：四元离子信号处理技术，可从MoCap数据自动评估舞蹈表演
3. EVALUATING EFFECTS OF DATA POPULARITY ON PERFORMANCE OF CACHING STRATEGY IN PULL-BASED DATA BORADCAST SYSTEMS [J] . DONG CHEON SHIN Journal of computer information systems . 2007,第4期

机译：数据多方性对基于拉式数据广播系统中缓存策略性能的影响评估
4. Performance Evaluation of Big Data Processing Strategies for Neuroimaging [C] . Valerie Hayot-Sasson, Shawn T Brown, Tristan Glatard IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2019

机译：神经影像大数据处理策略的绩效评估
5. Signal Processing Methods for the Inter-subject Registration of Neuroimaging Data. [D] . Conroy, Bryan R. 2010

机译：受试者间神经影像数据配准的信号处理方法。
6. Syndromic surveillance using veterinary laboratory data: data pre-processing and algorithm performance evaluation [O] . Fernanda C. Dórea, Beverly J. McEwen, W. Bruce McNab, 2013

机译：使用兽医实验室数据进行症状监测：数据预处理和算法性能评估
7. Evaluating Behavioral and Neuroimaging Data on Past Tense Processing [O] . Mark S. Seidenberg, James H. Hoeffner 2007

机译：评估过去时态处理的行为和神经影像数据

Performance Evaluation of Big Data Processing Strategies for Neuroimaging

摘要

著录项

相似文献

相关主题

期刊订阅