首页> 外文OA文献 >Audio-visual football video analysis, from structure detection to attention analysis

【2h】

Audio-visual football video analysis, from structure detection to attention analysis

机译：视听足球视频分析，从结构检测到注意力分析

页面导航

摘要
著录项
相似文献
相关主题

摘要

Sport video is an important video genre. Content-based sports video analysis attracts great interest from both industry and academic ﬁelds. A sports video is characterised by repetitive temporal structures, relatively plain contents, and strong spatio-temporal variations, such as quick camera switches and swift local motions. It is necessary to develop speciﬁc techniques for content-based sports video analysis to utilise these characteristics. For an efﬁcient and effective sports video analysis system, there are three fundamental questions: (1) what are key stories for sports videos; (2) what incurs viewer’s interest; and (3) how to identify game highlights. This thesis is developed around these questions. We approached these questions from two different perspectives and in turn three research contributions are presented, namely, replay detection, attack temporal structure decomposition, and attention-based highlight identiﬁcation. Replay segments convey the most important contents in sports videos. It is an efﬁcient approach to collect game highlights by detecting replay segments. However, replay is an artefact of editing, which improves with advances in video editing tools. The composition of replay is complex, which includes logo transitions, slow motions, viewpoint switches and normal speed video clips. Since logo transition clips are pervasive in game collections of FIFA World Cup 2002, FIFA World Cup 2006 and UEFA Championship 2006, we take logo transition detection as an effective replacement of replay detection. A two-pass system was developed, including a ﬁve-layer adaboost classiﬁer and a logo template matching throughout an entire video. The ﬁve-layer adaboost utilises shot duration, average game pitch ratio, average motion, sequential colour histogram and shot frequency between two neighbouring logo transitions, to ﬁlter out logo transition candidates. Subsequently, a logo template is constructed and employed to ﬁnd all transition logo sequences. The precision and recall of this system in replay detection is 100% in a ﬁve-game evaluation collection. An attack structure is a team competition for a score. Hence, this structure is a conceptually fundamental unit of a football video as well as other sports videos. We review the literature of content-based temporal structures, such as play-break structure, and develop a three-step system for automatic attack structure decomposition. Four content-based shot classes, namely, play, focus, replay and break were identiﬁed by low level visual features. A four-state hidden Markov model was trained to simulate transition processes among these shot classes. Since attack structures are the longest repetitive temporal unit in a sports video, a sufﬁx tree is proposed to ﬁnd the longest repetitive substring in the label sequence of shot class transitions. These occurrences of this substring are regarded as a kernel of an attack hidden Markov process. Therefore, the decomposition of attack structure becomes a boundary likelihood comparison between two Markov chains. Highlights are what attract notice. Attention is a psychological measurement of “notice ”. A brief survey of attention psychological background, attention estimation from vision and auditory, and multiple modality attention fusion is presented. We propose two attention models for sports video analysis, namely, the role-based attention model and the multiresolution autoregressive framework. The role-based attention model is based on the perception structure during watching video. This model removes reﬂection bias among modality salient signals and combines these signals by reﬂectors. The multiresolution autoregressive framework (MAR) treats salient signals as a group of smooth random processes, which follow a similar trend but are ﬁlled with noise. This framework tries to estimate a noise-less signal from these coarse noisy observations by a multiple resolution analysis. Related algorithms are developed, such as event segmentation on a MAR tree and real time event detection. The experiment shows that these attention-based approach can ﬁnd goal events at a high precision. Moreover, results of MAR-based highlight detection on the ﬁnal game of FIFA 2002 and 2006 are highly similar to professionally labelled highlights by BBC and FIFA.

机译：体育视频是一种重要的视频类型。基于内容的体育视频分析引起了行业和学术领域的极大兴趣。体育视频的特点是重复的时间结构，相对简单的内容以及强烈的时空变化，例如快速的摄像机切换和快速的局部运动。必须开发基于内容的体育视频分析的特定技术，以利用这些特性。对于一个高效有效的体育视频分析系统，存在三个基本问题：（1）体育视频的关键故事是什么？（2）引起观众兴趣的内容；（3）如何识别游戏亮点。本文围绕这些问题展开。我们从两个不同的角度探讨了这些问题，然后提出了三项研究成果，即重播检测，攻击时间结构分解和基于注意力的突出显示识别。重播片段传达了体育视频中最重要的内容。这是一种通过检测重播片段来收集游戏亮点的有效方法。但是，重放是编辑的人工产物，随着视频编辑工具的进步而提高。重播的内容很复杂，其中包括徽标过渡，慢动作，视点切换和正常速度的视频剪辑。由于徽标过渡剪辑普遍存在于FIFA 2002世界杯，FIFA 2006世界杯和UEFA Championship 2006中，因此我们将徽标过渡检测作为重播检测的有效替代。开发了两遍系统，包括一个五层的adaboost分类器和一个在整个视频中匹配的徽标模板。第五层adaboost利用射击持续时间，平均游戏俯仰比，平均运动，顺序颜色直方图和两个相邻徽标过渡之间的射击频率来筛选出徽标过渡候选对象。随后，构建徽标模板并将其用于查找所有过渡徽标序列。在五个游戏评估集中，该系统在重播检测中的精度和召回率是100％。攻击结构是指团队为获得分数而进行的比赛。因此，此结构是足球视频以及其他体育视频的概念上基本的单元。我们回顾了基于内容的时间结构（如游戏休息结构）的文献，并开发了自动攻击结构分解的三步系统。低级视觉功能可识别出四个基于内容的镜头类别，即比赛，焦点，重播和休息。训练了四状态隐藏马尔可夫模型以模拟这些镜头类之间的过渡过程。由于攻击结构是体育视频中最长的重复时间单位，因此提出了一个后缀树来查找镜头类转换的标签序列中最长的重复子串。这些子字符串的这些出现被视为攻击隐式马尔可夫过程的内核。因此，攻击结构的分解成为两个马尔可夫链之间的边界似然比较。亮点是吸引人们注意的地方。注意是对“通知”的心理衡量。简要介绍了注意心理背景，视觉和听觉的注意估计以及多种形式的注意融合。我们提出了两种用于体育视频分析的注意力模型，即基于角色的注意力模型和多分辨率自回归框架。基于角色的注意力模型基于观看视频期间的感知结构。该模型消除了模态显着信号之间的反射偏差，并通过反射器组合了这些信号。多分辨率自回归框架（MAR）将显着信号视为一组平滑的随机过程，这些过程遵循相似的趋势，但充满噪声。该框架试图通过多分辨率分析从这些粗噪声观察中估计出无噪声信号。开发了相关的算法，例如MAR树上的事件分割和实时事件检测。实验表明，这些基于注意力的方法可以高精度地找到目标事件。此外，在FIFA 2002和2006的最终比赛中基于MAR的高亮检测结果与BBC和FIFA的专业标记高光非常相似。

著录项

作者
Ren Reede;
展开▼
作者单位

展开▼
年度 2008
总页数
原文格式 PDF
正文语种 English
中图分类

相似文献

外文文献
中文文献
专利

1. An audio-visual human attention analysis approach to abrupt change detection in videos [J] . Yanxiang Chen, Minglong Song, Lixia Xue, Signal processing . 2015,第may期

机译：视听人类注意力分析方法，用于视频中的突然变化检测
2. Event detection in sports video based on audio-visual and support vector machine. Case-study: football [J] . Vijayan Ellapan, R. Rajkumar International Journal of Internet Technology and Secured Transactions . 2019,第1a2期

机译：基于视听和支持向量机的体育视频事件检测。案例研究：足球
3. Event detection in sports video based on audio-visual and support vector machine. Case-study: football [J] . Vijayan Ellapan, R. Rajkumar International Journal of Internet Technology and Secured Transactions . 2019,第1a2期

机译：基于视听和支持向量机的运动视频事件检测。案例研究：足球
4. A combined audio-visual contribution to event detection in field sports broadcast video. Case study: Gaelic football [C] . Sadlier, D.A., OConnor, . 2004

机译：组合视听对现场体育广播视频中事件检测的贡献。案例研究：盖尔足球
5. Audio-visual scene analysis with application in sports video. [D] . Xiong, Ziyou. 2004

机译：视听场景分析及其在体育视频中的应用。
6. Football incident analysis: a new video based method to describe injury mechanisms in professional football [O] . T Andersen, O Larsen, A Tenga, 2003

机译：足球事件分析：一种基于视频的新方法来描述职业足球的伤害机制
7. A combined audio-visual contribution to event detection in field sports broadcast video. Case study: Gaelic football [O] . Sadlier, David A., O'Connor, Noel E., Marlow, Seán, 2003

机译：组合视听对现场体育广播视频中事件检测的贡献。案例研究：盖尔足球

Audio-visual football video analysis, from structure detection to attention analysis

摘要

著录项

相似文献

相关主题

期刊订阅