首页> 外文期刊>IEEE transactions on visualization and computer graphics >Analysis of a parallel volume rendering system based on the shear-warp factorization
【24h】

Analysis of a parallel volume rendering system based on the shear-warp factorization

机译:基于剪切-翘曲分解的并行体绘制系统分析

获取原文
获取原文并翻译 | 示例
       

摘要

This paper presents a parallel volume rendering algorithm that can render a 256/spl times/256/spl times/225 voxel medical data set at over 15 Hz and a 512/spl times/512/spl times/334 voxel data set at over 7 Hz on a 32-processor Silicon Graphics Challenge. The algorithm achieves these results by minimizing each of the three components of execution time: computation time, synchronization time, and data communication time. Computation time is low because the parallel algorithm is based on the recently-reported shear-warp serial volume rendering algorithm which is over five times faster than previous serial algorithms. The algorithm uses run-length encoding to exploit coherence and an efficient volume traversal to reduce overhead. Synchronization time is minimized by using dynamic load balancing and a task partition that minimizes synchronization events. Data communication costs are low because the algorithm is implemented for shared-memory multiprocessors, a class of machines with hardware support for low-latency fine-grain communication and hardware caching to hide latency. We draw two conclusions from our implementation. First, we find that on shared-memory architectures data redistribution and communication costs do not dominate rendering time. Second, we find that cache locality requirements impose a limit on parallelism in volume rendering algorithms. Specifically, our results indicate that shared-memory machines with hundreds of processors would be useful only for rendering very large data sets.
机译:本文提出了一种并行体绘制算法,可以在超过15 Hz的条件下渲染256 / spl次/ 256 / spl次/ 225体素医学数据集,并在超过7 Hz时呈现512 / spl次/ 512 / spl次/ 334体素数据集32处理器Silicon Graphics挑战赛中的Hz。该算法通过最小化执行时间的三个组成部分中的每一个来获得这些结果:计算时间,同步时间和数据通信时间。计算时间很短,因为并行算法基于最近报告的剪切-扭曲串行体绘制算法,比以前的串行算法快五倍以上。该算法使用游程长度编码来利用相干性,并使用有效的体积遍历来减少开销。通过使用动态负载平衡和最小化同步事件的任务分区,可以将同步时间最小化。数据通信成本较低,因为该算法是针对共享内存多处理器实现的,共享内存多处理器是一类机器,具有对低延迟细粒度通信和硬件缓存的硬件支持,以隐藏延迟。我们从实施中得出两个结论。首先,我们发现在共享内存体系结构中,数据重新分配和通信成本并不能控制渲染时间。其次,我们发现缓存局部性要求对体绘制算法中的并行性施加了限制。具体来说,我们的结果表明,具有数百个处理器的共享内存计算机仅对呈现非常大的数据集有用。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号