In order to analyze the quality of the massive H264 videos in the 3G network, we set up a GPU cluster to decode the multi-videos on CUDA and evaluated the clarities of the decoded frames. This paper focuses on parallel intra prediction and its optimization. By improving parallel algorithm, adjusting data structure and rationally using multilevel memories of GPU, we show that these operations achieve an average of 63.8% decrease of execution time comparing to original algorithm.
展开▼