Video Scene Detection Using Compact Bag of Visual Word Models

Muhammad Haroon; Junaid Baber; Ihsan Ullah; Sher Muhammad Daudpota; Maheen Bakhtyar; Varsha Devi

首页> 外文期刊>Advances in multimedia >Video Scene Detection Using Compact Bag of Visual Word Models

【24h】

Video Scene Detection Using Compact Bag of Visual Word Models

机译：使用紧凑型视觉单词模型的视频场景检测

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Video segmentation into shots is the first step for video indexing and searching. Videos shots are mostly very small in duration and do not give meaningful insight of the visual contents. However, grouping of shots based on similar visual contents gives a better understanding of the video scene; grouping of similar shots is known as scene boundary detection or video segmentation into scenes. In this paper, we propose a model for video segmentation into visual scenes using bag of visual word (BoVW) model. Initially, the video is divided into the shots which are later represented by a set of key frames. Key frames are further represented by BoVW feature vectors which are quite short and compact compared to classical BoVW model implementations. Two variations of BoVW model are used: (1) classical BoVW model and (2) Vector of Linearly Aggregated Descriptors (VLAD) which is an extension of classical BoVW model. The similarity of the shots is computed by the distances between their key frames feature vectors within the sliding window of length L, rather comparing each shot with very long lists of shots which has been previously practiced, and the value of L is 4. Experiments on cinematic and drama videos show the effectiveness of our proposed framework. The BoVW is 25000-dimensional vector and VLAD is only 2048-dimensional vector in the proposed model. The BoVW achieves 0.90 segmentation accuracy, whereas VLAD achieves 0.83.

机译：将视频分割成快照是视频索引和搜索的第一步。视频镜头的时长通常很小，并且无法提供有意义的视觉内容见解。但是，基于相似的视觉内容对镜头进行分组可以更好地理解视频场景；相似镜头的分组称为场景边界检测或视频分割成场景。在本文中，我们提出了一种使用视觉单词袋（BoVW）模型将视频分割成视觉场景的模型。最初，视频被划分为镜头，随后由一组关键帧表示。关键帧进一步由BoVW特征向量表示，与传统的BoVW模型实现相比，该向量非常短且紧凑。使用BoVW模型的两个变体：（1）经典BoVW模型和（2）线性聚合描述符（VLAD）向量，它是经典BoVW模型的扩展。镜头的相似性是通过长度L的滑动窗口内关键帧特征向量之间的距离来计算的，而不是将每个镜头与之前已经练习过的很长的镜头列表进行比较，L的值为4。电影和戏剧视频显示了我们提出的框架的有效性。在所提出的模型中，BoVW是25000维向量，而VLAD仅是2048维向量。 BoVW达到0.90的分割精度，而VLAD达到0.83。

著录项

来源
《Advances in multimedia》 |2018年第2018期|2564963.1-2564963.9|共9页
作者
Muhammad Haroon; Junaid Baber; Ihsan Ullah; Sher Muhammad Daudpota; Maheen Bakhtyar; Varsha Devi;
展开▼
作者单位

Department of Computer Science & IT, University of Balochistan, Pakistan;

Department of Computer Science & IT, University of Balochistan, Pakistan;

Department of Computer Science & IT, University of Balochistan, Pakistan;

Department of Computer Science, SukkurlBA University, Pakistan;

Department of Computer Science & IT, University of Balochistan, Pakistan;

Department of Computer Science, Sardar Bahadur Khan Women's University, Pakistan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Video Scene Detection Using Compact Bag of Visual Word Models [J] . Muhammad Haroon, Junaid Baber, Ihsan Ullah, Advances in multimedia . 2018,第1期

机译：使用紧凑型视觉单词模型的视频场景检测
2. Bag of Contextual-Visual Words for Road Scene Object Detection From Mobile Laser Scanning Data [J] . Yongtao Yu, Jonathan Li, Haiyan Guan, IEEE Transactions on Intelligent Transportation Systems . 2016,第12期

机译：从移动激光扫描数据中检测道路场景目标的上下文视觉单词包
3. Scene classification based on the bag-of-visual-words and Doc2Vec models for high-spatial resolution remote-sensing imagery [J] . Li Wenqiang, Jin Gui, Dong Yin Journal of Applied Remote Sensing . 2019,第2期

机译：基于Visual-Lock和Doc2Vec模型的场景分类，用于高空间分辨率遥感图像
4. Video Copy Detection Using Inclined Video Tomography and Bag-of-Visual-Words [C] . Min Hyun-seok, Kim Se Min, De Neve Wesley, 2012 IEEE International Conference on Multimedia and Expo . 2012

机译：使用倾斜视频层析成像和视觉词袋的视频复制检测
5. Method of Adding Color Information to Spatially-Enhanced, Bag-of-Visual-Words Models [D] . Laurenson, Robert. 2021

机译：将颜色信息添加到空间增强的袋式袋 - 视觉上的方法的方法
6. Bag of Visual Words Model with Deep Spatial Features for Geographical Scene Classification [O] . Jiangfan Feng, Yuanyuan Liu, Lin Wu 2017

机译：具有深度空间特征的视觉单词模型袋用于地理场景分类
7. Natural scene classification, annotation and retrieval. Developing different approaches for semantic scene modelling based on Bag of Visual Words. [O] . Alqasrawi Yousef T. N. 2012

机译：自然场景分类，注释和检索。开发基于视觉单词袋的语义场景建模的不同方法。

Video Scene Detection Using Compact Bag of Visual Word Models

摘要

著录项

相似文献

相关主题

期刊订阅