Unlike surveillance videos, videos created by common users contain more frequent shot changes, more diversified backgrounds, and a wider variety of content. The existing methods have two critical issues for summarizing user-created videos: 1) information distortion 2) high redundancy among keyframes. Therefore, we propose a novel temporal attention model to evaluate the importance scores of each frame. Specifically, on the basis of the classical attention model, we combine the predictions of both encoder and decoder to ensure using integrate information to score frame-level importance. Further, in order to sift redundant frames out. we devise a feedforward reward function to quantify diversity, representativeness, and storyness properties of candidate keyframes in attention model. Last, the Deep Deterministic Policy Gradient algorithm is adopted to efficiently solve the proposed formulation. Extensive experiments on the public SumMe and TVSum datasets show that our method outperforms the state of the art by a large margin in terms of the F-score.
展开▼