To employ emotion semantic of associated audio modal data to guide extraction of highlights of video, a method, driven by audio emotion perception, is presented. An audio classifier, based on a bi-nary-tree support vector machine, is employed to obtain the mid-level audio type. With an emotion-mapping model integrated, high-level emotion semantic for associated audio modal data is obtained finally. The com-plete audio emotion perception model, including an audio classifier and an emotion-mapping model, is a pro-posed to analyze the emotion semantic fluctuation of associated audio. Furthermore, video highlights are extracted with additional aids including a start-stop positioning strategy for highlight and a method for audio video synchronization. Taking emotion semantic fluctuation series of audio as core data, an entire video highlight extraction framework, driven by audio-emotion, is constructed with a two-stage emotion percep-tion model of audio, which completes the most important leading analysis. The experiment demonstrates that the proposed framework can achieve high recall ratio and integrity with good generalized ability in the case of a certain guaranteed accuracy.%为了将伴生音频数据的情感语义用于引导视频精彩片段的提取,提出一种音频感知驱动下的视频精彩片段提取方法。为提取伴生音频数据的情感语义,使用一个基于分层二叉树支持向量机的音频分类器提取中层音频类型,并集成了一个情感映射模型以感知高层情感语义;然后利用该前置音频情感感知模型实现伴生音频情感语义的波动分析,并进一步以精彩片段起止定位策略和音视频同步修订为辅助手段,实现视频精彩片段的定位。文中方法以音频数据情感语义波动序列为核心枢纽,以两阶段音频情感感知模型为前导分析,构建了一个完整的音频情感驱动下视频精彩片段提取架构。实验结果表明,在保证一定查准率的情况下,音频情感驱动下的视频精彩片段提取具有较好的通用性,较高的查全率以及完整度。
展开▼