首页> 外文期刊>Journal of Real-Time Image Processing >Real-time localization of multi-oriented text in natural scene images using a linear spatial filter
【24h】

Real-time localization of multi-oriented text in natural scene images using a linear spatial filter

机译:使用线性空间滤波器的自然场景图像中多面文本的实时定位

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a multi-oriented text localization method in natural images suitable for real-time processing of high-definition video on portable and mobile devices. Our method is based on the connected components (CC) approach: first, CC are isolated by convolving a multi-scale pyramid with a specifically designed linear spatial filter followed by hysteresis thresholding. Next, non-textual CC are pruned employing a local classifier consisting of a cascade of multilayer perceptron (MLP) fed with increasingly extended feature vectors. The stroke width feature is estimated in linear time complexity by computing the maximal inscribed squares in the CC. Candidate CC and their neighbors are then checked using a more context aware neural network classifier that takes into account the target CC and their vicinity. Finally, text sequences are extracted in all pyramid levels and fused using dynamic programming. The main contribution of the work presented here is execution speed: the CPU-only parallel implementation of the proposed method is capable of processing 1080p HD video at nearly 30 frames per second on a standard laptop. Furthermore, when benchmarked on the ICDAR 2013 Robust Reading and on the ICDAR 2015 Incidental Scene Text data sets, our system performs more than twice faster than the state-of-the-art, while still delivering competitive results in terms of precision and recall.
机译:本文提出了一种在适用于在便携式和移动设备上的高清视频的实时处理的自然图像中的多面向文本定位方法。我们的方法基于连接的组件(CC)方法:首先,通过用专门设计的线性空间过滤器卷积,然后是滞后阈值处理,通过卷曲多尺度金字塔来隔离CC。接下来,采用非文本CC采用局部分类器,该分类器由具有越来越长的特征向量馈送的多层的Perceptron(MLP)组成。通过计算CC中的最大铭刻方块来估计行程宽度特征。然后使用更加上下文的意识的神经网络分类器检查候选CC及其邻居,该分类器考虑到目标CC及其附近。最后,文本序列在所有金字塔水平中提取并使用动态编程融合。此处提供的工作的主要贡献是执行速度:所提出的方法的仅限CPU的并行实现能够在标准膝上型计算机上以每秒近30帧处理1080p高清视频。此外,当在ICDAR 2013年鲁棒阅读和ICDAR 2015附带场景文本数据集时,我们的系统比最先进的方式执行超过两倍,同时仍然在精确和召回方面提供竞争结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号