首页> 外文期刊>SN Applied Sciences >A deep action‑oriented video image classification system for text detection and recognition
【24h】

A deep action‑oriented video image classification system for text detection and recognition

机译:针对文本检测和识别的深度采取的导向视频图像分类系统

获取原文
获取原文并翻译 | 示例
           

摘要

For the video images with complex actions, achieving accurate text detection and recognition results is very challenging. This paper presents a hybrid model for classification of action-oriented video images which reduces the complexity of the problem to improve text detection and recognition performance. Here, we consider the following five categories of genres, namely concert, cooking, craft, teleshopping and yoga. For classifying action-oriented video images, we explore ResNet50 for learning the general pixel-distribution level information and the VGG16 network is implemented for learning the features of Maximally Stable Extremal Regions and again another VGG16 is used for learning facial components obtained by a multitask cascaded convolutional network. The approach integrates the outputs of the three above-mentioned models using a fully connected neural network for classification of five action-oriented image classes. We demonstrated the efficacy of the proposed method by testing on our dataset and two other standard datasets, namely, Scene Text Dataset dataset which contains 10 classes of scene images with text information, and the Stanford 40 Actions dataset which contains 40 action classes without text information. Our method outperforms the related existing work and enhances the class-specific performance of text detection and recognition, significantly.
机译:对于具有复杂动作的视频图像,实现准确的文本检测和识别结果非常具有挑战性。本文介绍了一种针对行动导向视频图像分类的混合模型,这降低了提高文本检测和识别性能的问题的复杂性。在这里,我们考虑以下五类类型的类型,即音乐会,烹饪,工艺,电视电视和瑜伽。为了分类面向动作的视频图像,我们探索Reset50以学习一般像素分配级别信息,并且VGG16网络实现用于学习最大稳定的极端区域的特征,并且再次用于通过多态级联获得的学习面部部件来学习另一个VGG16卷积网络。该方法使用完全连接的神经网络集成了三个上述模型的输出,以进行分类五种面向的图像类。我们通过对我们的数据集和两个其他标准数据集进行测试来证明所提出的方法的功效,即包含具有文本信息的10个类场景图像的场景文本数据集,并且包含40个没有文本信息的40个操作类的stanford 40操作数据集。我们的方法优于相关的现有工作,并提高文本检测和识别的类别特定性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号