首页> 外文会议>International Conference on Computer Vision and Graphics >Extracting Textual Overlays from Social Media Videos Using Neural Networks
【24h】

Extracting Textual Overlays from Social Media Videos Using Neural Networks

机译:用神经网络从社交媒体视频中提取文本叠加

获取原文

摘要

Textual overlays are often used in social media videos as people who watch them without the sound would otherwise miss essential information conveyed In the audio stream. This is why extraction of those overlays can serve as an important meta-data source, e.g. for content classification or retrieval tasks. In this work, we present a robust method for extracting textual overlays from videos that builds up on multiple neural network architectures. The proposed solution relies on several processing steps: keyframe extraction, text detection and text recognition. The main component of our system, i.e. the text recognition module, is inspired by a convolutional recurrent neural network architecture and we improve its performance using synthetically generated dataset of over 600,000 images with text prepared by authors specifically for this task. We also develop a filtering method that reduces the amount of overlapping text phrases using Levenshtein distance and further boosts system's performance. The final accuracy of our solution reaches over 80% and is au pair with state-of-the-art methods.
机译:文本叠加通常用于社交媒体视频,因为在没有声音的情况下观看它们的人会错过在音频流中传达的基本信息。这就是为什么提取这些覆盖层可以作为重要的元数据源,例如,用于内容分类或检索任务。在这项工作中,我们介绍了一种从多个神经网络架构上建立的视频中提取文本叠加的强大方法。所提出的解决方案依赖于几个处理步骤:关键帧提取,文本检测和文本识别。我们的系统的主要组成部分,即文本识别模块受到卷积经常性神经网络架构的启发,我们可以使用超过60,000张图像的合成生成的数据集提高其性能,其中文本专门为此任务提供了由作者准备的文本。我们还开发了一种过滤方法,减少了使用Levenshtein距离的重叠文本短语的量,并进一步提升了系统的性能。我们的解决方案的最终精度达到80%以上,并且是互惠生与最先进的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号