首页> 外文会议>International conference on computer vision and graphics >Extracting Textual Overlays from Social Media Videos Using Neural Networks
【24h】

Extracting Textual Overlays from Social Media Videos Using Neural Networks

机译:使用神经网络从社交媒体视频中提取文字叠加

获取原文

摘要

Textual overlays are often used in social media videos as people who watch them without the sound would otherwise miss essential information conveyed in the audio stream. This is why extraction of those overlays can serve as an important meta-data source, e.g. for content classification or retrieval tasks. In this work, we present a robust method for extracting textual overlays from videos that builds up on multiple neural network architectures. The proposed solution relies on several processing steps: keyframe extraction, text detection and text recognition. The main component of our system, i.e. the text recognition module, is inspired by a convolutional recurrent neural network architecture and we improve its performance using synthetically generated dataset of over 600,000 images with text prepared by authors specifically for this task. We also develop a filtering method that reduces the amount of overlapping text phrases using Levenshtein distance and further boosts system's performance. The final accuracy of our solution reaches over 80% and is au pair with state-of-the-art methods.
机译:文字叠加层经常用于社交媒体视频中,因为观看这些内容而没有声音的人会错过音频流中传达的基本信息。这就是为什么提取这些覆盖图可以用作重要的元数据源的原因,例如用于内容分类或检索任务。在这项工作中,我们提出了一种从视频中提取文本叠加层的可靠方法,该方法基于多种神经网络体系结构构建。提出的解决方案依赖于几个处理步骤:关键帧提取,文本检测和文本识别。我们系统的主要组件(即文本识别模块)受到卷积递归神经网络体系结构的启发,我们使用合成生成的超过60万张图像的数据集(作者专门为此任务编写的文本)提高了其性能。我们还开发了一种过滤方法,该方法可使用Levenshtein距离减少重叠文本短语的数量,并进一步提高系统的性能。我们解决方案的最终精度达到80%以上,并且与最先进的方法相对应。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号