...
首页> 外文期刊>Computers & Graphics >Algorithm optimizations and mapping scheme for interactive ray tracing on a reconfigurable architecture
【24h】

Algorithm optimizations and mapping scheme for interactive ray tracing on a reconfigurable architecture

机译:可重构体系结构上交互式光线跟踪的算法优化和映射方案

获取原文
获取原文并翻译 | 示例
           

摘要

This paper presents a mapping scheme of an optimized octree-based ray tracing algorithm and its implementation on a SIMD reconfigurable architecture, MorphoSys, with appropriate hardware incorporated. A two-level SIMD mapping scheme for ray tracing is chosen to get better trade-off between coherence exploitation efficiency and bandwidth requirements. We apply a SIMD octree traversal algorithm that supports ray traversals of any origins and directions. Moreover, we have applied the bottom-up traversal order for shadow and reflection rays to avoid unnecessary testing. The memory overhead of the parallel execution of ray tracing in SIMD systems is analyzed to direct memory optimization. Pre-fetching is utilized to hide data fetch latency behind the computation. A Spatial Partitioning Tree buffer reduces the latency due to the interleaved accesses to the shared memory. It also dynamically exploits ray coherence to save memory bandwidth. A Pointer Update Unit and a Pointer Buffer are combined to remove the overhead resulted from pointer-calculations and stack pushes during the parallel depth-first-traversal process. The associated hardware cost is less than 2% of the whole system. In order to include diffuse effects into the output, we apply spherical harmonic. Post-synthesis simulation shows that the target chip is estimated to be 33 mm~2 and consumes less than 1 W in the worst case. Cycle-accurate simulation demonstrates that interactive ray tracing for medium-sized scenes is achieved on MorphoSys.
机译:本文提出了一种基于八叉树的最佳光线跟踪算法的映射方案,以及在SIMD可重配置体系结构MorphoSys上的实现方案,并结合了适当的硬件。选择用于光线跟踪的两级SIMD映射方案,以在相干开发效率和带宽要求之间获得更好的权衡。我们应用SIMD八叉树遍历算法,该算法支持任何原点和方向的射线遍历。此外,我们对阴影和反射光线采用了自下而上的遍历顺序,以避免不必要的测试。分析了SIMD系统中并行执行光线跟踪的内存开销,以指导内存优化。预提取用于将数据提取延迟隐藏在计算之后。空间分区树缓冲区由于对共享内存的交错访问而减少了延迟。它还动态地利用射线相干性来节省内存带宽。指针更新单元和指针缓冲区相结合,以消除并行深度优先遍历过程中指针计算和堆栈推入所导致的开销。相关的硬件成本不到整个系统的2%。为了在输出中包含漫射效果,我们应用了球谐函数。合成后的仿真表明,目标芯片估计为33 mm〜2,在最坏的情况下功耗不到1W。精确周期的仿真表明,在MorphoSys上可以实现中等大小场景的交互式光线跟踪。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号