【24h】

Efficient Training and Inference in Highly Temporal Activity Recognition

机译:高效的临时活动识别训练和推理

获取原文

摘要

High-performance Activity Recognition models from video data are difficult to train and deploy efficiently. We measureefficiency in performance, model size, and run-time; during training and inference. Researchers have demonstrated that3D convolutions capture the space-time dynamics well [13]. The challenge is that 3D convolutions are computationallyintensive. [8] Propose the Temporal Shift Module (TSM) for train-efficiency, and [5] proposes DeepCompression forinference-efficiency. TSM is a simple yet effective way to gain near 3D convolution performance at 2D convolutioncomputation cost. We apply these efficiency techniques to a newly labeled activity recognition data set through transferlearning. Our labeling strategy is meant to create highly temporal activity. We benchmark against a 2D ResNet50 backbonetrained on individual frames, and a multilayer 3DCNN on multi-frame short videos. Our contributions are: 1. A new highlytemporal activity recognition dataset based on egoHands [1]. 2. results that show a 3D backbone on videos outperforms a2D one on frames. 3. With TSM we achieve 5x train efficiency in run-time with negligible performance loss. 4. WithQuantization alone we achieve 10x inference efficiency in model size with negligible performance loss.
机译:来自视频数据的高性能活动识别模型很难有效地训练和部署。我们测量 性能,模型大小和运行时的效率;在训练和推论过程中。研究人员已经证明 3D卷积很好地捕获了时空动力学[13]。挑战在于3D卷积在计算上 密集的。 [8]提出了时间转换模块(TSM)来提高列车的效率,[5]提出了DeepCompression来提高列车的效率。 推理效率。 TSM是在2D卷积中获得接近3D卷积性能的简单而有效的方法 计算成本。我们通过转移将这些效率技术应用于新标记的活动识别数据集 学习。我们的标签策略旨在创建高度临时的活动。我们以2D ResNet50主干网为基准 在单个帧上进行训练,在多层短视频上进行多层3DCNN训练。我们的贡献是:1.崭新的高度 基于egoHands的时间活动识别数据集[1]。 2.结果显示视频上的3D主干胜过 2D一帧。 3.使用TS​​M,我们可以在运行时将火车效率提高5倍,而性能损失却可以忽略不计。 4.用 仅通过量化,我们就可以在模型大小上实现10倍的推理效率,而性能损失却可以忽略不计。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号