首页> 外文会议>IAPR International Conference on Document Analysis and Recognition >Temporal Integration for Word-Wise Caption and Scene Text Identification
【24h】

Temporal Integration for Word-Wise Caption and Scene Text Identification

机译:时间集成,实现明智的字幕和场景文本识别

获取原文
获取外文期刊封面目录资料

摘要

Generally video consists of edited text (i.e., caption text) and natural text (i.e., scene text), and these two texts differ from one another in nature as well as characteristics. Such different behaviors of caption and scene texts lead to poor accuracy for text recognition in video. In this paper, we explore wavelet decomposition and temporal coherency for the classification of caption and scene text. We propose wavelet of high frequency sub-bands to separate text candidates that are represented by high frequency coefficients in an input word. The proposed method studies the distribution of text candidates over word images based on the fact that the standard deviation of text candidates is high at the first zone, low at the middle zone and high at the third zone. This is extracted by mapping standard deviation values to 8 equal sized bins formed based on the range of standard deviation values. The correlation among bins at the first and second levels of wavelets is explored to differentiate caption and scene text and for determining the number of temporal frames to be analyzed. The properties of caption and scene texts are validated with the chosen temporal frames to find the stable property for classification. Experimental results on three standard datasets (ICDAR 2015, YVT and License Plate Video) show that the proposed method outperforms the existing methods in terms of classification rate and improves recognition rate significantly based on classification results.
机译:通常,视频由编辑后的文本(即标题文本)和自然文本(即场景文本)组成,这两种文本在本质和特性上都互不相同。字幕和场景文本的这种不同行为导致视频中文本识别的准确性较差。在本文中,我们探讨了小波分解和时间相干性,用于字幕和场景文本的分类。我们提出了高频子带的小波来分离候选文本,这些候选文本由输入词中的高频系数表示。所提出的方法基于以下事实研究文本候选者在单词图像上的分布:文本候选者的标准偏差在第一区域较高,在中间区域较低,在第三区域较高。通过将标准偏差值映射到基于标准偏差值范围形成的8个大小相等的分箱中来提取此值。探索小波的第一级和第二级的bin之间的相关性,以区分字幕和场景文本,并确定要分析的时间帧的数量。字幕和场景文本的属性使用所选的时间帧进行验证,以找到用于分类的稳定属性。在三个标准数据集(ICDAR 2015,YVT和车牌视频)上的实验结果表明,该方法在分类率方面优于现有方法,并基于分类结果显着提高了识别率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号