A solution is provided for detecting video highlights of a sports video. A video highlight of a sports video is a portion of the sports video and represents a semantically important event captured in the sports video. An audio stream associated with the sports video is evaluated, e.g., the loudness and length of the loudness of the portions of the audio stream. Video segments of the sports video are selected based on the evaluation of the audio stream. Each selected video segment represents a video highlight candidate of the sports video. A trained audio classification model is used to recognize the voice patterns in the audio stream associated with each selected video segment. Based on the comparison of the recognized video patterns with a set of desired voice patterns, one or more video segments are selected as the video highlights of the sports video.
展开▼