首页> 外文会议>IEEE/CVF Conference on Computer Vision and Pattern Recognition >Listen to Look: Action Recognition by Previewing Audio
【24h】

Listen to Look: Action Recognition by Previewing Audio

机译:聆听外观:通过预览音频进行动作识别

获取原文

摘要

In the face of the video data deluge, today's expensive clip-level classifiers are increasingly impractical. We propose a framework for efficient action recognition in untrimmed video that uses audio as a preview mechanism to eliminate both short-term and long-term visual redundancies. First, we devise an ImgAud2Vid framework that hallucinates clip-level features by distilling from lighter modalities---a single frame and its accompanying audio---reducing short-term temporal redundancy for efficient clip-level recognition. Second, building on ImgAud2Vid, we further propose ImgAud-Skimming, an attention-based long short-term memory network that iteratively selects useful moments in untrimmed videos, reducing long-term temporal redundancy for efficient video-level recognition. Extensive experiments on four action recognition datasets demonstrate that our method achieves the state-of-the-art in terms of both recognition accuracy and speed.
机译:面对视频数据泛滥,当今昂贵的剪辑级分类器越来越不切实际。我们提出了一种在未修剪视频中进行有效动作识别的框架,该框架使用音频作为预览机制来消除短期和长期的视觉冗余。首先,我们设计了一个ImgAud2Vid框架,该框架通过从较轻的模式(单个帧及其随附的音频)中提炼出幻觉剪辑级功能,从而减少了短期时间冗余,从而实现了有效的剪辑级识别。其次,在ImgAud2Vid的基础上,我们进一步提出了ImgAud-Skimming,这是一种基于注意力的长期短期记忆网络,可以迭代地在未修剪的视频中选择有用的时刻,从而减少了有效的视频级识别的长期时间冗余。在四个动作识别数据集上进行的大量实验表明,我们的方法在识别准确度和速度方面都达到了最新水平。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号