首页> 外文会议>International Conference on Multimedia Modeling >A Deep Convolutional Deblurring and Detection Neural Network for Localizing Text in Videos
【24h】

A Deep Convolutional Deblurring and Detection Neural Network for Localizing Text in Videos

机译:用于在视频中定位文本的深度卷积去模糊和检测神经网络

获取原文

摘要

Scene text in the video is usually vulnerable to various blurs like those caused by camera or text motions, which brings additional difficulty to reliably extract them from the video for content-based video applications. In this paper, we propose a novel fully convolutional deep neural network for deblurring and detecting text in the video. Specifically, to cope with blur of video text, we propose an effective deblurring subnetwork that is composed of multi-level convolutional blocks with both cross-block (long) and within-block (short) skip connections for progressively learning residual deblurred image details as well as a spatial attention mechanism to pay more attention on blurred regions, which generates the sharper image for current frame by fusing multiple surrounding adjacent frames. To further localize text in the frames, we enhance the EAST text detection model by introducing deformable convolution layers and deconvolution layers, which better capture widely varied appearances of video text. Experiments on the public scene text video dataset demonstrate the state-of-the-art performance of the proposed video text deblurring and detection model.
机译:视频中的场景文本通常容易受到各种模糊的影响,例如由照相机或文本运动引起的模糊,这给基于内容的视频应用程序从视频中可靠地提取它们带来了额外的困难。在本文中,我们提出了一种新颖的全卷积深度神经网络,用于对视频中的文本进行去模糊和检测。具体来说,为了解决视频文本的模糊问题,我们提出了一种有效的去模糊子网,该子网由具有交叉块(长)和块内(短)跳过连接的多级卷积块组成,用于逐步学习残差图像细节以及一种空间注意力机制来对模糊区域给予更多关注,该机制通过融合周围的多个相邻帧为当前帧生成更清晰的图像。为了进一步在帧中定位文本,我们通过引入可变形卷积层和反卷积层来增强EAST文本检测模型,以更好地捕获视频文本的多种外观。在公共场景文本视频数据集上进行的实验证明了所提出的视频文本去模糊和检测模型的最新性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号