首页> 外文学位 >A content-adaptive analysis and representation framework for summarization using audio cues.
【24h】

A content-adaptive analysis and representation framework for summarization using audio cues.

机译:一种内容自适应的分析和表示框架,用于使用音频提示进行汇总。

获取原文
获取原文并翻译 | 示例

摘要

We propose a content-adaptive analysis and representation framework that postpones the use of content-specific processing to a stage as late as possible. We propose an inlier/outlier based representation based on audio analysis for this task. It is based on the key observation that the audio features in the vicinity of "interesting" events are outliers in a background "uninteresting" events.; The analysis framework to support such an inlier/outlier based representation is based on detecting outlier subsequences from a time series of audio features or semantic audio labels. Using a sliding window, we sample the whole time series and estimate statistical models for the usual "uninteresting" background. We construct an affinity/kernel matrix by computing pairwise distances between the estimated statistical models. Then, using a graph theoretic approach for grouping, we detect outlier subsequences which cause the corresponding statistical models in their times of occurrence to be different from other estimates of the dominant background. We also rank the detected outliers based on how deviant it is from the background. Once we detect all subsequences that are outliers from a background, then we bring in domain knowledge or content-specific processing to pick out a subset of outliers that are correlated with "interesting" events for that domain or content genre. Such a framework also helps in the choice of key audio classes in a data driven way instead of relying on intuition.; We apply the proposed framework to consumer video browsing. For sports content, we show that commercials and highlight events are among the outliers in sports audio and can be effectively extracted using such an analysis and representation framework. We also show that the key highlight audio class obtained systematically through the outlier detection procedure outperforms the cheering audio class (chosen based on intuition) for sports highlights extraction. For situation comedy video, we detect scene transitions and laughter tracks successfully based on the outlier detection framework. The proposed framework detects suspicious events from elevator surveillance audio as outliers effectively. Finally, we show that key audio classes that are correlated with events of interest can be systematically acquired using the proposed framework.
机译:我们提出了一种内容自适应的分析和表示框架,该框架将对特定于内容的处理的使用尽可能推迟到一个阶段。我们为此任务提出了一个基于音频分析的基于异常值的表示。基于关键的观察,“有趣”事件附近的音频特征在背景“有趣”事件中是离群值。支持这种基于离群值/离群值的表示的分析框架基于从音频特征或语义音频标签的时间序列中检测离群值子序列。使用滑动窗口,我们对整个时间序列进行采样,并为通常的“无趣”背景估计统计模型。我们通过计算估计的统计模型之间的成对距离来构造亲和力/内核矩阵。然后,使用图论方法进行分组,我们检测出异常子序列,这些异常子序列导致相应的统计模型在其发生时与主导背景的其他估计不同。我们还根据其与背景的偏离程度对检测到的离群值进行排名。一旦我们从背景中检测到所有异常值的子序列,我们就会引入领域知识或特定于内容的处理,以挑选与该领域或内容类型的“有趣”事件相关的异常值的子集。这样的框架也有助于以数据驱动的方式而不是依靠直觉来选择关键的音频类别。我们将建议的框架应用于消费者视频浏览。对于体育内容,我们证明了广告和精彩事件是体育音频中的异常值,可以使用这种分析和表示框架有效地提取出来。我们还表明,通过异常值检测程序系统地获得的关键亮点音频类的性能优于为体育亮点提取而欢呼的音频类(基于直觉选择的)。对于情节喜剧视频,我们基于异常值检测框架成功检测了场景转换和笑声轨迹。所提出的框架可以有效地将电梯监控音频中的可疑事件检测为离群值。最后,我们表明可以使用建议的框架系统地获取与感兴趣的事件相关的关键音频类别。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号