首页> 外文期刊>IEEE Transactions on Image Processing >FREE: A Fast and Robust End-to-End Video Text Spotter
【24h】

FREE: A Fast and Robust End-to-End Video Text Spotter

机译:免费:快速且坚固的端到端视频文本观察员

获取原文
获取原文并翻译 | 示例
       

摘要

Currently, video text spotting tasks usually fall into the four-staged pipeline: detecting text regions in individual images, recognizing localized text regions frame-wisely, tracking text streams and post-processing to generate final results. However, they may suffer from the huge computational cost as well as sub-optimal results due to the interferences of low-quality text and the none-trainable pipeline strategy. In this article, we propose a fast and robust end-to-end video text spotting framework named FREE by only recognizing the localized text stream one-time instead of frame-wise recognition. Specifically, FREE first employs a well-designed spatial-temporal detector that learns text locations among video frames. Then a novel text recommender is developed to select the highest-quality text from text streams for recognizing. Here, the recommender is implemented by assembling text tracking, quality scoring and recognition into a trainable module. It not only avoids the interferences from the low-quality text but also dramatically speeds up the video text spotting. FREE unites the detector and recommender into a whole framework, and helps achieve global optimization. Besides, we collect a large scale video text dataset for promoting the video text spotting community, containing 100 videos from 21 real-life scenarios. Extensive experiments on public benchmarks show our method greatly speeds up the text spotting process, and also achieves the remarkable state-of-the-art.
机译:目前,视频文本拍摄任务通常属于四个阶段的管道:检测单个图像中的文本区域,识别本地化文本区域的框架 - 明智地,跟踪文本流和后处理以生成最终结果。然而,由于低质量文本和无培训管道策略的干扰,它们可能遭受巨大的计算成本以及次优效果。在本文中,我们提出了一种快速且强大的端到端视频文本拍摄框架,仅通过一次识别本地化文本流而不是帧展识别来命名。具体而言,免费首先采用精心设计的空间 - 时间检测器,用于在视频帧之间学习文本位置。然后开发了一种新颖的文本推荐器以从文本流中选择最高质量的文本以进行识别。在这里,推荐人通过组装文本跟踪,质量评分和识别到培训模块来实现。它不仅避免了低质量文本的干扰,而且还急剧加速视频文本斑点。免费将探测器和推荐人注入整个框架,并有助于实现全局优化。此外,我们收集了一个大规模的视频文本数据集,用于推广视频文本拍摄社区,其中包含来自21个现实方案的100个视频。对公共基准测试的广泛实验显示我们的方法大大加快了文本的斑点过程,并实现了非凡的最先进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号