FREE: A Fast and Robust End-to-End Video Text Spotter

Zhanzhan Cheng; Jing Lu; Baorui Zou; Liang Qiao; Yunlu Xu; Shiliang Pu; Yi Niu; Fei Wu; Shuigeng Zhou

首页> 外文期刊>IEEE Transactions on Image Processing >FREE: A Fast and Robust End-to-End Video Text Spotter

【24h】

FREE: A Fast and Robust End-to-End Video Text Spotter

机译：免费：快速且坚固的端到端视频文本观察员

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Currently, video text spotting tasks usually fall into the four-staged pipeline: detecting text regions in individual images, recognizing localized text regions frame-wisely, tracking text streams and post-processing to generate final results. However, they may suffer from the huge computational cost as well as sub-optimal results due to the interferences of low-quality text and the none-trainable pipeline strategy. In this article, we propose a fast and robust end-to-end video text spotting framework named FREE by only recognizing the localized text stream one-time instead of frame-wise recognition. Specifically, FREE first employs a well-designed spatial-temporal detector that learns text locations among video frames. Then a novel text recommender is developed to select the highest-quality text from text streams for recognizing. Here, the recommender is implemented by assembling text tracking, quality scoring and recognition into a trainable module. It not only avoids the interferences from the low-quality text but also dramatically speeds up the video text spotting. FREE unites the detector and recommender into a whole framework, and helps achieve global optimization. Besides, we collect a large scale video text dataset for promoting the video text spotting community, containing 100 videos from 21 real-life scenarios. Extensive experiments on public benchmarks show our method greatly speeds up the text spotting process, and also achieves the remarkable state-of-the-art.

机译：目前，视频文本拍摄任务通常属于四个阶段的管道：检测单个图像中的文本区域，识别本地化文本区域的框架 - 明智地，跟踪文本流和后处理以生成最终结果。然而，由于低质量文本和无培训管道策略的干扰，它们可能遭受巨大的计算成本以及次优效果。在本文中，我们提出了一种快速且强大的端到端视频文本拍摄框架，仅通过一次识别本地化文本流而不是帧展识别来命名。具体而言，免费首先采用精心设计的空间 - 时间检测器，用于在视频帧之间学习文本位置。然后开发了一种新颖的文本推荐器以从文本流中选择最高质量的文本以进行识别。在这里，推荐人通过组装文本跟踪，质量评分和识别到培训模块来实现。它不仅避免了低质量文本的干扰，而且还急剧加速视频文本斑点。免费将探测器和推荐人注入整个框架，并有助于实现全局优化。此外，我们收集了一个大规模的视频文本数据集，用于推广视频文本拍摄社区，其中包含来自21个现实方案的100个视频。对公共基准测试的广泛实验显示我们的方法大大加快了文本的斑点过程，并实现了非凡的最先进。

著录项

来源
《IEEE Transactions on Image Processing》 |2021年第1期|822-837|共16页
作者
Zhanzhan Cheng; Jing Lu; Baorui Zou; Liang Qiao; Yunlu Xu; Shiliang Pu; Yi Niu; Fei Wu; Shuigeng Zhou;
展开▼
作者单位

College of Computer Science and Technology Zhejiang University Hangzhou China;

Hikvision Research Institute Hangzhou China;

Shanghai Key Laboratory of Intelligent Information Processing Fudan University Shanghai China;

Hikvision Research Institute Hangzhou China;

Hikvision Research Institute Hangzhou China;

Hikvision Research Institute Hangzhou China;

Hikvision Research Institute Hangzhou China;

College of Computer Science and Technology Zhejiang University Hangzhou China;

Shanghai Key Laboratory of Intelligent Information Processing Fudan University Shanghai China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Text recognition; Streaming media; Detectors; Task analysis; Pipelines; Feature extraction; Licenses;

机译：文本识别;流媒体;探测器;任务分析;管道;特征提取;许可证;

相似文献

外文文献
中文文献
专利

1. An end-to-end text spotter with text relation networks [J] . Jiang Jianguo, Wei Baole, Yu Min, Cybersecurity . 2021,第a期

机译：具有文本关系网络的端到端文本特点
2. An end-to-end text spotter with text relation networks [J] . Jianguo Jiang, Baole Wei, Min Yu, 网络空间安全科学与技术（英文版） . 2021,第002期
3. Text2Video: An End-to-end Learning Framework for Expressing Text With Videos [J] . Xiaoshan Yang, Tianzhu Zhang, Changsheng Xu Multimedia, IEEE Transactions on . 2018,第9期

机译：Text2Video：用于通过视频表达文本的端到端学习框架
4. Mask Text Spotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes [C] . Pengyuan Lyu, Minghui Liao, Cong Yao, European conference on computer vision . 2018

机译：遮罩文字点样器：端到端可训练的神经网络，用于点样具有任意形状的文本
5. End-to-end rate distortion analysis and optimization for robust video transmission over lossy networks. [D] . Zhang, Rui. 2001

机译：端到端速率失真分析和优化，用于有损网络上的稳健视频传输。
6. A Fast and Robust Text Spotter [O] . Siyang Qin, Roberto Manduchi -1

机译：快速健壮的文本查找器
7. FREE: A Fast and Robust End-to-End Video Text Spotter [O] . Zhanzhan Cheng, Jing Lu, Baorui Zou, 2021

机译：免费：快速且坚固的端到端视频文本观察员

FREE: A Fast and Robust End-to-End Video Text Spotter

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅