Temporal Integration for Word-Wise Caption and Scene Text Identification

机译：时间集成，实现明智的字幕和场景文本识别

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Generally video consists of edited text (i.e., caption text) and natural text (i.e., scene text), and these two texts differ from one another in nature as well as characteristics. Such different behaviors of caption and scene texts lead to poor accuracy for text recognition in video. In this paper, we explore wavelet decomposition and temporal coherency for the classification of caption and scene text. We propose wavelet of high frequency sub-bands to separate text candidates that are represented by high frequency coefficients in an input word. The proposed method studies the distribution of text candidates over word images based on the fact that the standard deviation of text candidates is high at the first zone, low at the middle zone and high at the third zone. This is extracted by mapping standard deviation values to 8 equal sized bins formed based on the range of standard deviation values. The correlation among bins at the first and second levels of wavelets is explored to differentiate caption and scene text and for determining the number of temporal frames to be analyzed. The properties of caption and scene texts are validated with the chosen temporal frames to find the stable property for classification. Experimental results on three standard datasets (ICDAR 2015, YVT and License Plate Video) show that the proposed method outperforms the existing methods in terms of classification rate and improves recognition rate significantly based on classification results.

机译：通常，视频由编辑后的文本（即标题文本）和自然文本（即场景文本）组成，这两种文本在本质和特性上都互不相同。字幕和场景文本的这种不同行为导致视频中文本识别的准确性较差。在本文中，我们探讨了小波分解和时间相干性，用于字幕和场景文本的分类。我们提出了高频子带的小波来分离候选文本，这些候选文本由输入词中的高频系数表示。所提出的方法基于以下事实研究文本候选者在单词图像上的分布：文本候选者的标准偏差在第一区域较高，在中间区域较低，在第三区域较高。通过将标准偏差值映射到基于标准偏差值范围形成的8个大小相等的分箱中来提取此值。探索小波的第一级和第二级的bin之间的相关性，以区分字幕和场景文本，并确定要分析的时间帧的数量。字幕和场景文本的属性使用所选的时间帧进行验证，以找到用于分类的稳定属性。在三个标准数据集（ICDAR 2015，YVT和车牌视频）上的实验结果表明，该方法在分类率方面优于现有方法，并基于分类结果显着提高了识别率。

著录项

来源
《IAPR International Conference on Document Analysis and Recognition》|2017年|349-354|共6页
会议地点
作者
Sangheeta Roy; Palaiahnakote Shivakumara; Umapada Pal; Tong Lu; Ainuddin Wahid Bin Abdul Wahab;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Text recognition; Standards; Image color analysis; Optical character recognition software; Image recognition; Feature extraction; Image segmentation;

机译：文本识别;标准;图像色彩分析;光学字符识别软件;图像识别;特征提取;图像分割;

相似文献

外文文献
中文文献
专利

1. Integrating Scene Semantic Knowledge into Image Captioning [J] . Wei Haiyang, Li Zhixin, Huang Feicheng, ACM transactions on multimedia computing communications and applications . 2021,第2期

机译：将场景语义知识集成到图像标题中
2. Implementing a real-time image captioning service for scene identification using embedded system [J] . Hsieh He-Yen, Huang Sheng-An, Leu Jenq-Shiou Multimedia Tools and Applications . 2021,第8期

机译：使用嵌入式系统实现用于场景标识的实时图像字幕服务
3. Correction to: Implementing a real-time image captioning service for scene identification using embedded system [J] . Hsieh He-Yen, Huang Sheng-An, Leu Jenq-Shiou Multimedia Tools and Applications . 2021,第8期

机译：校正：使用嵌入式系统实现用于场景识别的实时图像字幕服务
4. Temporal Integration for Word-Wise Caption and Scene Text Identification [C] . Sangheeta Roy, Palaiahnakote Shivakumara, Umapada Pal, IAPR International Conference on Document Analysis and Recognition . 2017

机译：Word-Wise标题和场景文本识别的时间集成
5. Temporal co-expression network construction, leading sub-network identification, and time point alignment in the integration of phase transition theory and systems biology and the application to human type 1 diabetes. [D] . Wolanyk, Nathaniel. 2016

机译：在相变理论和系统生物学的整合以及对人类1型糖尿病的应用中，建立时间共表达网络，主导子网识别和时间点对齐。
6. Temporal Integration of Text Transcripts and Acoustic Features for Alzheimers Diagnosis Based on Spontaneous Speech [O] . Matej Martinc, Fasih Haider, Senja Pollak, 2021

机译：基于自发演讲的阿尔茨海默诊断的文本成绩单和声学特征的时间集成
7. VIDEO SCENE DETECTION USING CLOSED CAPTION TEXT [O] . Smith Gregory 2009

机译：使用隐藏字幕文字的视频场景检测
8. Identification of Text and Symbols on a Liquid Crystal Display Part I: Characterisation of the Luminance, Temporal and Spectral Characteristics [R] . Fletcher, K., Sutherland, S. 2009

机译：液晶显示器上文本和符号的识别第一部分：亮度，时间和光谱特征的表征

Temporal Integration for Word-Wise Caption and Scene Text Identification

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅