This paper presents efficient mappings of large sparse neural networks on a distributed-memory MIMD multicomputer with high performance vector units. We develop parallel vector code for an idealized network and analyze its performance. Our algorithms combine high performance with a reasonable memory requirement. Due to the high cost of scatter/gather operations, generating high performance parallel vector code requires careful attention to details of the representation. We show that vectorization can nevertheless more than quadruple the performance on our modeled supercomputer. Pushing several patterns at a time through the network (batch mode) exposes an extra degree of parallelism which allows us to improve the performance by an additional factor of 4. Vectorization and batch updating therefore yield an order of magnitude performance improvement.
展开▼