机译:TSM2X:GPU上的高性能高瘦矩阵矩阵乘法
The University of Alabama Tuscaloosa AL 35487 USA;
Oak Ridge National Laboratory Oak Ridge TN 37830 USA;
University of California Riverside Riverside CA 92521 USA;
University of Colorado Colorado Springs CO 80918 USA;
The University of Sydney NSW 2006 Australia;
Washington State University Pullman WA 99164 USA The University of Alabama Tuscaloosa AL 35487 USA;
Matrix-matrix multiplication; Tall-and-skinny matrix; GPU; CUDA; Performance optimization;