Most learning-based no-reference (NR) video quality assessment (VQA) needs to be trained with a lot of subjectivequality scores. However, it is currently difficult to obtain a large volume of subjective scores for videos. Inspired by thesuccess of full-reference VQA methods based on the spatiotemporal slice (STS) images in the extraction of perceptualfeatures and evaluation of video quality, this paper adopts multi-directional video STS images, which are imagescomposed of multi-directional sections of video data, to deal with the lacking of subjective quality scores. By samplingthe STS images of video into image patches and adding noise to the quality labels of patches, a successful NR VQAmodel based on multi-directional STS images and neural network training is proposed. Specifically, first, we select thesubjective database that currently contains the largest number of real distortion videos as the test set. Second, we performmulti-directional STS extraction on the videos and sample the local patches from the multi -directional STS to augmentthe training sample set. Besides, we add some noise to the quality label of the local patches. Third, a reasonable deepneural network is constructed and trained to obtain a local quality prediction model for each patch in the STS image, andthen the quality of an entire video is obtained by averaging the model prediction results of multi -directional STS images.Finally, the experiment results indicate that the proposed method tackles the insufficiency of training samples in smallsubjective VQA dataset and obtains a high correlation with the subjective evaluation.
展开▼