Extracting Textual Overlays from Social Media Videos Using Neural Networks

机译：使用神经网络从社交媒体视频中提取文字叠加

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Textual overlays are often used in social media videos as people who watch them without the sound would otherwise miss essential information conveyed in the audio stream. This is why extraction of those overlays can serve as an important meta-data source, e.g. for content classification or retrieval tasks. In this work, we present a robust method for extracting textual overlays from videos that builds up on multiple neural network architectures. The proposed solution relies on several processing steps: keyframe extraction, text detection and text recognition. The main component of our system, i.e. the text recognition module, is inspired by a convolutional recurrent neural network architecture and we improve its performance using synthetically generated dataset of over 600,000 images with text prepared by authors specifically for this task. We also develop a filtering method that reduces the amount of overlapping text phrases using Levenshtein distance and further boosts system's performance. The final accuracy of our solution reaches over 80% and is au pair with state-of-the-art methods.

机译：文字叠加层经常用于社交媒体视频中，因为观看这些内容而没有声音的人会错过音频流中传达的基本信息。这就是为什么提取这些覆盖图可以用作重要的元数据源的原因，例如用于内容分类或检索任务。在这项工作中，我们提出了一种从视频中提取文本叠加层的可靠方法，该方法基于多种神经网络体系结构构建。提出的解决方案依赖于几个处理步骤：关键帧提取，文本检测和文本识别。我们系统的主要组件（即文本识别模块）受到卷积递归神经网络体系结构的启发，我们使用合成生成的超过60万张图像的数据集（作者专门为此任务编写的文本）提高了其性能。我们还开发了一种过滤方法，该方法可使用Levenshtein距离减少重叠文本短语的数量，并进一步提高系统的性能。我们解决方案的最终精度达到80％以上，并且与最先进的方法相对应。

著录项

来源
《International conference on computer vision and graphics》|2018年|287-299|共13页
会议地点
作者
Adam Slucki; Tomasz Trzcinski; Adam Bielski; Pawel Cyrta;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Deep Convolution Neural Network Approach for Textual Prediction and Sentiment Analysis in Social Media Networks [J] . Kumar N. K. Senthil, Malarvizhi N Journal of computational and theoretical nanoscience . 2018,第9a10期

机译：社交网络中文本预测和情感分析的深度卷积神经网络方法
2. The Impact Of Various Digitized Social Networking Media Through Text, Images And Videos On Language Usage [J] . Sampurnananda Mishra, Chandra Kanta Samal, Navneet Yadav, International Journal of Scientific & Technology Research . 2019,第10期

机译：各种数字化社交网络媒体通过文本，图像和视频对语言使用的影响
3. Trust-Based Video Management Framework for Social Multimedia Networks [J] . Mada Badr Eddine, Bagaa Miloud, Taleb Tarik IEEE transactions on multimedia . 2019,第3期

机译：社交多媒体网络的基于信任的视频管理框架
4. Extracting Textual Overlays from Social Media Videos Using Neural Networks [C] . Adam Slucki, Tomasz Trzcinski, Adam Bielski, International Conference on Computer Vision and Graphics . 2018

机译：用神经网络从社交媒体视频中提取文本叠加
5. A Unified Framework based on Convolutional Neural Networks for Interpreting Carotid Intima-Media Thickness Videos [D] . Shin, Jaeyul 2016

机译：基于卷积神经网络的统一框架，用于解释颈动脉内膜介质厚度视频
6. Can pre-trained convolutional neural networks be directly used as a feature extractor for video-based neonatal sleep and wake classification? [O] . Muhammad Awais, Xi Long, Bin Yin, 2020

机译：可以预先训练的卷积神经网络直接用作基于视频的新生儿睡眠和唤醒分类的特征提取器吗？
7. Multimodal and Crossmodal Representation Learning from Textual and Visual Features with Bidirectional Deep Neural Networks for Video Hyperlinking [O] . Vukotić, Vedran, Raymond, Christian, Gravier, Guillaume 2016

机译：通过双向深度神经网络从文本和视觉特征中学习多模式和交叉模式表示，以进行视频超链接

Extracting Textual Overlays from Social Media Videos Using Neural Networks

摘要

著录项

相似文献

相关主题

期刊订阅