The kd-tree is a fundamental tool in computer science. Among otherapplications, the application of kd-tree search (by the tree method) to thefast evaluation of particle interactions and neighbor search is highlyimportant, since the computational complexity of these problems is reduced fromO(N^2) for a brute force method to O(N log N) for the tree method, where N isthe number of particles. In this paper, we present a parallel implementation ofthe tree method running on a graphics processing unit (GPU). We present adetailed description of how we have implemented the tree method on a CypressGPU. An optimization that we found important is localized particle ordering toeffectively utilize cache memory. We present a number of test results andperformance measurements. Our results show that the execution of the treetraversal in a force calculation on a GPU is practical and efficient.
展开▼