首页> 外文期刊>International Journal of Pattern Recognition and Artificial Intelligence >A New Hybrid Method for Caption and Scene Text Classification in Action Video Images
【24h】

A New Hybrid Method for Caption and Scene Text Classification in Action Video Images

机译:动作视频图像中标题和场景文本分类的新混合方法

获取原文
获取原文并翻译 | 示例
       

摘要

Achieving a better recognition rate for text in action video images is challenging due to multiple types of text with unpredictable actions in the background. In this paper, we propose a new method for the classification of caption (which is edited text) and scene text (text that is a part of the video) in video images. This work considers five action classes, namely, Yoga, Concert, Teleshopping, Craft, and Recipes, where it is expected that both types of text play a vital role in understanding the video content. The proposed method introduces a new fusion criterion based on Discrete Cosine Transform (DCT) and Fourier coefficients to obtain the reconstructed images for caption and scene text. The fusion criterion involves computing the variances for coefficients of corresponding pixels of DCT and Fourier images, and the same variances are considered as the respective weights. This step results in Reconstructed image-1. Inspired by the special property of Chebyshev-Harmonic-Fourier-Moments (CHFM) that has the ability to reconstruct a redundancy-free image, we explore CHFM for obtaining the Reconstructed image-2. The reconstructed images along with the input image are passed to a Deep Convolutional Neural Network (DCNN) for classification of caption/scene text. Experimental results on five action classes and a comparative study with the existing methods demonstrate that the proposed method is effective. In addition, the recognition results of the before and after the classification obtained from different methods show that the recognition performance improves significantly after classification, compared to before classification.
机译:由于多种类型的文本在背景中实现了多种文本,因此实现了更好的动作视频图像识别率是具有挑战性的。在本文中,我们提出了一种新方法,用于分类标题(被编辑的文本)和视频图像中的场景文本(是视频的一部分的文本)。这项工作考虑了五个动作课程,即瑜伽,音乐会,电视电像,工艺和食谱,预计这两种类型的文本在理解视频内容方面发挥着重要作用。该方法基于离散余弦变换(DCT)和傅里叶系数引入了新的融合标准,以获得用于标题和场景文本的重建图像。融合标准涉及计算DCT和傅里叶图像的相应像素的系数的差异,并且与相同的差异被认为是相应的权重。该步骤导致重建图像-1。灵感来自Chebyshev-Harmonic-Fourtime(CHFM)的特殊属性,可以重建无冗余图像,我们探索CHFM获取重建的图像-2。重建的图像以及输入图像被传递给深度卷积神经网络(DCNN),用于分类字幕/场景文本。对现有方法的五种动作类和比较研究的实验结果表明,所提出的方法是有效的。此外,与不同方法获得的分类之前和之后的识别结果表明,与分类之前,分类后识别性能显着提高。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号