Listen to Look: Action Recognition by Previewing Audio

机译：聆听外观：通过预览音频进行动作识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In the face of the video data deluge, today's expensive clip-level classifiers are increasingly impractical. We propose a framework for efficient action recognition in untrimmed video that uses audio as a preview mechanism to eliminate both short-term and long-term visual redundancies. First, we devise an ImgAud2Vid framework that hallucinates clip-level features by distilling from lighter modalities---a single frame and its accompanying audio---reducing short-term temporal redundancy for efficient clip-level recognition. Second, building on ImgAud2Vid, we further propose ImgAud-Skimming, an attention-based long short-term memory network that iteratively selects useful moments in untrimmed videos, reducing long-term temporal redundancy for efficient video-level recognition. Extensive experiments on four action recognition datasets demonstrate that our method achieves the state-of-the-art in terms of both recognition accuracy and speed.

机译：面对视频数据泛滥，当今昂贵的剪辑级分类器越来越不切实际。我们提出了一种在未修剪视频中进行有效动作识别的框架，该框架使用音频作为预览机制来消除短期和长期的视觉冗余。首先，我们设计了一个ImgAud2Vid框架，该框架通过从较轻的模式（单个帧及其随附的音频）中提炼出幻觉剪辑级功能，从而减少了短期时间冗余，从而实现了有效的剪辑级识别。其次，在ImgAud2Vid的基础上，我们进一步提出了ImgAud-Skimming，这是一种基于注意力的长期短期记忆网络，可以迭代地在未修剪的视频中选择有用的时刻，从而减少了有效的视频级识别的长期时间冗余。在四个动作识别数据集上进行的大量实验表明，我们的方法在识别准确度和速度方面都达到了最新水平。

著录项

来源
《IEEE/CVF Conference on Computer Vision and Pattern Recognition》|2020年|10454-10464|共11页
会议地点
作者
Ruohan Gao; Tae-Hyun Oh; Kristen Grauman; Lorenzo Torresani;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Redundancy; Visualization; Buildings; Proposals; Image segmentation; Image recognition; Spatiotemporal phenomena;

机译：冗余;可视化;建筑物;建议;图像分割;图像识别;时空现象;

相似文献

外文文献
中文文献
专利

1. Action representation of sound: audiomotor recognition network while listening to newly acquired actions. [J] . Lahav A, Saltzman E, Schlaug G The Journal of Neuroscience: The Official Journal of the Society for Neuroscience . 2007,第2期

机译：声音的动作表示：听新获得的动作时，音频马达识别网络。
2. Action representation of sound: audiomotor recognition network while listening to newly acquired actions. [J] . Lahav A, Saltzman E, Schlaug G The Journal of Neuroscience: The Official Journal of the Society for Neuroscience . 2007,第2期

机译：声音的动作表示：听新获得的动作时，音频马达识别网络。
3. Increasing Elementary-aged Students' Reading Fluency with Small-group Interventions: A Comparison of Repeated Reading, Listening Passage Preview, and Listening Only Strategies [J] . John C. Begeny, Hailey E. Krouse, Sarah G. Ross, Journal of behavioral education . 2009,第3期

机译：通过小组干预提高小学生的阅读流利度：重复阅读，听力段落预览和仅听力策略的比较
4. Criticality of audio stimuli for listening tests - listening durations during a ranking task [C] . Jonas Ekeroot, Jan Berg, Arne Nykaenen Audio Engineering Society convention . 2014

机译：听力测试中音频刺激的重要性-排名任务期间的听力持续时间
5. A Comparison of the Effects of Repeated Readings with and without Live Model Listening Preview on Reading Fluency and Comprehension for English Language Learners. [D] . Berry, Laura. 2010

机译：比较使用和不使用实时模型聆听预览的重复阅读对英语学习者的阅读流畅度和理解力的影响。
6. Action Representation of Sound: Audiomotor Recognition Network While Listening to Newly Acquired Actions [O] . Amir Lahav, Elliot Saltzman, Gottfried Schlaug 2007

机译：声音的动作表示：聆听新获得的动作时的Audiomotor识别网络
7. Listen to Look: Action Recognition by Previewing Audio [O] . Ruohan Gao, Tae-Hyun Oh, Kristen Grauman, 2020

机译：听取：通过预览音频来识别行动识别

Listen to Look: Action Recognition by Previewing Audio

摘要

著录项

相似文献

相关主题

期刊订阅