首页> 外文期刊>Multimedia Tools and Applications >Scene text detection using enhanced Extremal region and convolutional neural network
【24h】

Scene text detection using enhanced Extremal region and convolutional neural network

机译:使用增强的极值区域和卷积神经网络的场景文本检测

获取原文
获取原文并翻译 | 示例
           

摘要

Text in scene images usually contains significant information. Text detection and recognition in the scene is important for a variety of advanced machine vision applications, such as image and video retrieval, automotive assistance, and multilingual translation. In particular, most text recognition systems require texts to be localized in images beforehand and this is a significant demand. The purpose of this study is to provide a method to detect texts in natural images. The proposed approach combines advantages of extremal region, ER, methods and classification of convolutional neural network, CNN. This significantly reduces the false positives and increases the accuracy of detection. The method of sliding windows is employed with different sizes in order to determine text candidates. Extraction of enhanced ERs is performed in three consecutive stages on three distinct color channels, R, G, and B. Then, the results are combined together by an add method. After grouping, the word candidates are classified to two classes of text and nontext sections by a CNN classifier. By applying non-maximum suppression (NMS) algorithm to the same words, words with the highest probability are selected. The average values of accuracy, recall, precision and F-measure of the proposed text detection model on the ICDAR2013 database are 0.893, 0.962, 0.948, and 0.955, respectively. The optimal cut point of the proposed method is 0.648, which has the highest average accuracy, 91.93%. The AUC of ROC and PR diagrams for the proposed model are 0.851 and 0.718, respectively. These results of AUC for ROC and PR curves showed an outstanding enhancement in comparison with the best detection rate of previous methods. Experimental results on the ICDAR2011, ICDAR2013 and ICDAR2015 databases also demonstrate that our algorithm outperforms the state-of-the-art scene text detection methods.
机译:场景图像中的文本通常包含重要信息。场景中的文本检测和识别对于各种先进的机器视觉应用是重要的,例如图像和视频检索,汽车辅助和多语言翻译。特别是,大多数文本识别系统事先要求文本在图像中本地化,这是一个重要的需求。本研究的目的是提供一种检测自然图像文本的方法。该方法结合了卷积神经网络的极值区域,ER,方法和分类的优点。这显着降低了误报并提高了检测的准确性。滑动窗口的方法采用不同的尺寸,以便确定文本候选。在三个不同的颜色通道,R,G和B中,在三个连续阶段中进行增强的增强剂的提取。然后,结果通过添加方法组合在一起。分组后,候选词被CNN分类器分类为两类文本和非文本部分。通过将非最大抑制(NMS)算法应用于相同的单词,选择具有最高概率的单词。 ICDAR2013数据库上所提出的文本检测模型的准确度,召回,精度和F测量的平均值分别为0.893,0.962,0.948和0.955。所提出的方法的最佳切割点为0.648,具有最高的平均精度,91.93%。拟议模型的ROC和PR图的AUC分别为0.851和0.718。 ROC和PR曲线AUC的这些结果表明,与先前方法的最佳检测率相比,具有出色的增强。 ICDAR2011,ICDAR2013和ICDAR2015数据库的实验结果还证明了我们的算法优于最先进的场景文本检测方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号