首页> 外文期刊>IEEE Transactions on Image Processing >Representing and Retrieving Video Shots in Human-Centric Brain Imaging Space
【24h】

Representing and Retrieving Video Shots in Human-Centric Brain Imaging Space

机译:在以人为中心的大脑成像空间中表示和检索视频镜头

获取原文
获取原文并翻译 | 示例
       

摘要

Meaningful representation and effective retrieval of video shots in a large-scale database has been a profound challenge for the image/video processing and computer vision communities. A great deal of effort has been devoted to the extraction of low-level visual features, such as color, shape, texture, and motion for characterizing and retrieving video shots. However, the accuracy of these feature descriptors is still far from satisfaction due to the well-known semantic gap. In order to alleviate the problem, this paper investigates a novel methodology of representing and retrieving video shots using human-centric high-level features derived in brain imaging space (BIS) where brain responses to natural stimulus of video watching can be explored and interpreted. At first, our recently developed dense individualized and common connectivity-based cortical landmarks (DICCCOL) system is employed to locate large-scale functional brain networks and their regions of interests (ROIs) that are involved in the comprehension of video stimulus. Then, functional connectivities between various functional ROI pairs are utilized as BIS features to characterize the brain's comprehension of video semantics. Then an effective feature selection procedure is applied to learn the most relevant features while removing redundancy, which results in the formation of the final BIS features. Afterwards, a mapping from low-level visual features to high-level semantic features in the BIS is built via the Gaussian process regression (GPR) algorithm, and a manifold structure is then inferred, in which video key frames are represented by the mapped feature vectors in the BIS. Finally, the manifold-ranking algorithm concerning the relationship among all data is applied to measure the similarity between key frames of video shots. Experimental results on the TRECVID 2005 dataset demonstrate the superiority of the proposed work in comparison with traditional methods.
机译:在大型数据库中有意义地表示和有效检索视频镜头已成为图像/视频处理和计算机视觉界的重大挑战。为了提取和捕捉视频镜头的特征,人们已经在提取低级视觉特征(例如颜色,形状,纹理和运动)方面付出了很多努力。但是,由于众所周知的语义鸿沟,这些特征描述符的准确性仍远远不能令人满意。为了缓解这一问题,本文研究了一种新颖的方法,该方法使用源自脑成像空间(BIS)的以人为中心的高级特征来表示和检索视频镜头,可以探索和解释大脑对视频观看的自然刺激的反应。首先,我们最近开发的密集,个性化且基于通用连通性的皮质界标(DICCCOL)系统用于定位参与视频刺激理解的大规模功能性大脑网络及其感兴趣区域(ROI)。然后,将各种功能ROI对之间的功能连通性用作BIS功能,以表征大脑对视频语义的理解。然后,应用有效的特征选择过程来学习最相关的特征,同时消除冗余,从而形成最终的BIS特征。之后,通过高斯过程回归(GPR)算法在BIS中建立从低层视觉特征到高层语义特征的映射,然后推断流形结构,其中视频关键帧由映射的特征表示BIS中的向量。最后,应用关于所有数据之间关系的流形排序算法来测量视频关键帧之间的相似度。在TRECVID 2005数据集上的实验结果证明了与传统方法相比,所提出工作的优越性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号