首页> 外文会议>International Conference on Smart Electronics and Communication >Pruning Long-term Recurrent Convolutional Networks for Video Classification and captioning
【24h】

Pruning Long-term Recurrent Convolutional Networks for Video Classification and captioning

机译:修剪长期递归卷积网络以进行视频分类和字幕

获取原文

摘要

Images, videos and speech are three frequent data fields found everywhere. Nearly 6 billion YouTube videos are watched daily. Video classification is an emerging application with few works proposed and indulging a lot of challenges. Video is a collection of frames (images). Simple algorithms utterly fail. Long-term recurrent convolutional networks (LRCN) are apt for video classification tasks as they are capable of capturing spatial and temporal behaviour. Though this network achieved significant accuracy, it passes every frame of a video through a CNN and LSTM which makes it computationally challenging. The time taken for training or testing a video is too high as CNN has to process several frames. So, in our work, this long-term recurrent convolutional network have been pruned. Several pruning techniques have been analysed. Here by providing metric evidence claimed that our pruned LRCN became 2x times faster with just a drop of 4% accuracy after pruning 50% of filters. The time is taken to classify test videos gradually decreased by 40% when Resnet101 is considered. In the video captioning task, Usage of simple CNN only yielded 19.3 BLE-U4 scores whereas pruned RSNET-101 produced 26.3 BLEU-4 Score. So, our work suggests to use pruned deep CNNS as an encoder instead of Custom-built simple CNNS as processing time of both networks is made equal. The parallel computing of frames is not considered by the user has been assumed.
机译:图像,视频和语音是随处可见的三个常见数据字段。每天观看近60亿个YouTube视频。视频分类是一个新兴的应用程序,提出了很少的作品,却带来了很多挑战。视频是帧(图像)的集合。简单的算法完全失败。长期递归卷积网络(LRCN)能够捕获空间和时间行为,因此很适合视频分类任务。尽管此网络达到了很高的准确性,但它会通过CNN和LSTM传递视频的每一帧,这给计算带来了挑战。由于CNN必须处理多个帧,因此训练或测试视频所花费的时间太长。因此,在我们的工作中,该长期循环卷积网络已被修剪。已经分析了几种修剪技术。通过提供度量标准证据,我们修剪出的LRCN在修剪掉50%的过滤器后变得快了2倍,而准确率下降了4%。当考虑使用Resnet101时,对测试视频进行分类所花费的时间逐渐减少了40%。在视频字幕任务中,简单CNN的使用仅产生19.3 BLE-U4得分,而修剪后的RSNET-101产生26.3 BLEU-4得分。因此,我们的工作建议使用修剪后的深层CNNS作为编码器,而不要使用定制的简单CNNS,因为这两个网络的处理时间是相等的。假定用户未考虑帧的并行计算。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号