首页> 外文学位 >Text Detection in Natural Scenes and Technical Diagrams with Convolutional Feature Learning and Cascaded Classification.
【24h】

Text Detection in Natural Scenes and Technical Diagrams with Convolutional Feature Learning and Cascaded Classification.

机译:具有卷积特征学习和级联分类的自然场景和技术图中的文本检测。

获取原文
获取原文并翻译 | 示例

摘要

An enormous amount of digital images are being generated and stored every day. Understanding text in these images is an important challenge with large impacts for academic, industrial and domestic applications. Recent studies address the difficulty of separating text targets from noise and background, all of which vary greatly in natural scenes. To tackle this problem, we develop a text detection system to analyze and utilize visual information in a data driven, automatic and intelligent way.;The proposed method incorporates features learned from data, including patch-based coarse-to-fine detection (Text-Conv), connected component extraction using region growing, and graph-based word segmentation (Word-Graph). Text-Conv is a sliding window-based detector, with convolution masks learned using the Convolutional k-means algorithm (Coates et. al, 2011). Unlike convolutional neural networks (CNNs), a single vector/layer of convolution mask responses are used to classify patches. An initial coarse detection considers both local and neighboring patch responses, followed by refinement using varying aspect ratios and rotations for a smaller local detection window. Different levels of visual detail from ground truth are utilized in each step, first using constraints on bounding box intersections, and then a combination of bounding box and pixel intersections. Combining masks from different Convolutional k-means initializations, e.g., seeded using random vectors and then support vectors improves performance. The Word-Graph algorithm uses contextual information to improve word segmentation and prune false character detections based on visual features and spatial context. Our system obtains pixel, character, and word detection f-measures of 93.14%, 90.26%, and 86.77% respectively for the ICDAR 2015 Robust Reading Focused Scene Text dataset, out-performing state-of-the-art systems, and producing highly accurate text detection masks at the pixel level.;To investigate the utility of our feature learning approach for other image types, we perform tests on 8- bit greyscale USPTO patent drawing diagram images. An ensemble of Ada-Boost classifiers with different convolutional features (MetaBoost) is used to classify patches as text or background. The Tesseract OCR system is used to recognize characters in detected labels and enhance performance. With appropriate pre-processing and post-processing, f-measures of 82% for part label location, and 73% for valid part label locations and strings are obtained, which are the best obtained to-date for the USPTO patent diagram data set used in our experiments.;To sum up, an intelligent refinement of convolutional k-means-based feature learning and novel automatic classification methods are proposed for text detection, which obtain state-of-the-art results without the need for strong prior knowledge. Different ground truth representations along with features including edges, color, shape and spatial relationships are used coherently to improve accuracy. Different variations of feature learning are explored, e.g. support vector-seeded clustering and MetaBoost, with results suggesting that increased diversity in learned features benefit convolution-based text detectors.
机译:每天都会生成并存储大量的数字图像。了解这些图像中的文字是一项重要的挑战,对学术,工业和家庭应用都将产生巨大影响。最近的研究解决了将文本目标与噪声和背景分离的困难,所有这些目标在自然场景中差异很大。为解决此问题,我们开发了一种文本检测系统,以数据驱动,自动和智能的方式分析和利用视觉信息。;该方法结合了从数据中学习到的功能,包括基于补丁的粗到细检测(Text-转换),使用区域增长的关联成分提取以及基于图的单词分割(Word-Graph)。 Text-Conv是基于滑动窗口的检测器,具有使用卷积k均值算法学习的卷积掩码(Coates等,2011)。与卷积神经网络(CNN)不同,卷积掩码响应的单个向量/层用于对补丁进行分类。最初的粗略检测会同时考虑局部和邻近斑块响应,然后使用变化的宽高比和旋转进行细化以实现较小的局部检测窗口。在每个步骤中,首先使用对边界框交点的约束,然后对边界框交点和像素交点进行组合,使用来自地面真相的不同级别的视觉细节。组合来自不同卷积k均值初始化的遮罩,例如使用随机向量播种然后支持向量的遮罩,可提高性能。 Word-Graph算法使用上下文信息来改善单词分割和基于视觉特征和空间上下文的修剪假字符检测。我们的系统针对ICDAR 2015健壮的阅读重点场景文本数据集获得了93.14%,90.26%和86.77%的像素,字符和单词检测f值,性能优于最新系统,并且产生了为了在像素级别精确地检测文本,我们将对8位灰度USPTO专利绘图图图像进行测试,以研究我们的特征学习方法对其他图像类型的实用性。具有不同卷积特征的Ada-Boost分类器集合(MetaBoost)用于将补丁分类为文本或背景。 Tesseract OCR系统用于识别检测到的标签中的字符并增强性能。通过适当的预处理和后处理,部件标签位置的f度量为82%,有效部件标签位置和字符串的f度量为73%,这是迄今为止使用的USPTO专利图数据集中获得的最佳结果。综上所述,提出了一种基于卷积k均值的特征学习的智能优化方法和新颖的文本分类自动分类方法,这些方法无需先进的先验知识即可获得最新的结果。连贯使用不同的地面真相表示以及包括边缘,颜色,形状和空间关系的特征,以提高准确性。探索了特征学习的不同变体,例如支持向量播种的聚类和MetaBoost,结果表明,学习特征的多样性增加有利于基于卷积的文本检测器。

著录项

  • 作者

    Zhu, Siyu.;

  • 作者单位

    Rochester Institute of Technology.;

  • 授予单位 Rochester Institute of Technology.;
  • 学科 Computer science.;Electrical engineering.
  • 学位 Ph.D.
  • 年度 2016
  • 页码 185 p.
  • 总页数 185
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 公共建筑;
  • 关键词

  • 入库时间 2022-08-17 11:40:46

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号