One embodiment of the present invention sets forth a technique for efficiently performing N-body computations using parallel computation systems. The technique involves a first processing step whereby a force matrix is partitioned into tiles, which are assigned to a one or more thread groups for processing. An off-diagonal tile may be aligned to include no diagonal cells, while an on-diagonal tile includes diagonal cells. One approach for computing either type of tile involves assigning each row from a tile to a thread within a thread group. Each thread operates on an offset pattern to avoid access conflicts to a shared memory. A net force for each atom within an N-body system is then computed by efficiently adding constituent forces stored within the force matrix using reduction operations on the force matrix.
展开▼