Extracting Textual Overlays from Social Media Videos Using Neural Networks

机译：用神经网络从社交媒体视频中提取文本叠加

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Textual overlays are often used in social media videos as people who watch them without the sound would otherwise miss essential information conveyed In the audio stream. This is why extraction of those overlays can serve as an important meta-data source, e.g. for content classification or retrieval tasks. In this work, we present a robust method for extracting textual overlays from videos that builds up on multiple neural network architectures. The proposed solution relies on several processing steps: keyframe extraction, text detection and text recognition. The main component of our system, i.e. the text recognition module, is inspired by a convolutional recurrent neural network architecture and we improve its performance using synthetically generated dataset of over 600,000 images with text prepared by authors specifically for this task. We also develop a filtering method that reduces the amount of overlapping text phrases using Levenshtein distance and further boosts system's performance. The final accuracy of our solution reaches over 80% and is au pair with state-of-the-art methods.

机译：文本叠加通常用于社交媒体视频，因为在没有声音的情况下观看它们的人会错过在音频流中传达的基本信息。这就是为什么提取这些覆盖层可以作为重要的元数据源，例如，用于内容分类或检索任务。在这项工作中，我们介绍了一种从多个神经网络架构上建立的视频中提取文本叠加的强大方法。所提出的解决方案依赖于几个处理步骤：关键帧提取，文本检测和文本识别。我们的系统的主要组成部分，即文本识别模块受到卷积经常性神经网络架构的启发，我们可以使用超过60,000张图像的合成生成的数据集提高其性能，其中文本专门为此任务提供了由作者准备的文本。我们还开发了一种过滤方法，减少了使用Levenshtein距离的重叠文本短语的量，并进一步提升了系统的性能。我们的解决方案的最终精度达到80％以上，并且是互惠生与最先进的方法。

著录项

来源
《International Conference on Computer Vision and Graphics》|2018年|536p|共13页
会议地点
作者
Adam Slucki; Tomasz Trzcinski; Adam Bielski; Pawel Cyrta;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.4-53;
关键词

相似文献

外文文献
中文文献
专利

1. Deep Convolution Neural Network Approach for Textual Prediction and Sentiment Analysis in Social Media Networks [J] . Kumar N. K. Senthil, Malarvizhi N Journal of computational and theoretical nanoscience . 2018,第9a10期

机译：社交网络中文本预测和情感分析的深度卷积神经网络方法
2. The Impact Of Various Digitized Social Networking Media Through Text, Images And Videos On Language Usage [J] . Sampurnananda Mishra, Chandra Kanta Samal, Navneet Yadav, International Journal of Scientific & Technology Research . 2019,第10期

机译：各种数字化社交网络媒体通过文本，图像和视频对语言使用的影响
3. Trust-Based Video Management Framework for Social Multimedia Networks [J] . Mada Badr Eddine, Bagaa Miloud, Taleb Tarik IEEE transactions on multimedia . 2019,第3期

机译：社交多媒体网络的基于信任的视频管理框架
4. Extracting Textual Overlays from Social Media Videos Using Neural Networks [C] . Adam Slucki, Tomasz Trzcinski, Adam Bielski, International conference on computer vision and graphics . 2018

机译：使用神经网络从社交媒体视频中提取文字叠加
5. A Unified Framework based on Convolutional Neural Networks for Interpreting Carotid Intima-Media Thickness Videos [D] . Shin, Jaeyul 2016

机译：基于卷积神经网络的统一框架，用于解释颈动脉内膜介质厚度视频
6. Can pre-trained convolutional neural networks be directly used as a feature extractor for video-based neonatal sleep and wake classification? [O] . Muhammad Awais, Xi Long, Bin Yin, 2020

机译：可以预先训练的卷积神经网络直接用作基于视频的新生儿睡眠和唤醒分类的特征提取器吗？
7. Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking [O] . Vukotić, Vedran, Raymond, Christian, Gravier, Guillaume 2016

机译：通过双向深度神经网络从文本和视觉特征中学习多模式和交叉模式表示，以进行视频超链接

Extracting Textual Overlays from Social Media Videos Using Neural Networks

摘要

著录项

相似文献

相关主题

期刊订阅