Spatiotemporal Convolutional Features for Lipreading

机译：Lipreading的时空卷积特征

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a visual parametrization method for the task of lipreading and audiovisual speech recognition from frontal face videos. The presented features utilize learned spatiotemporal convolutions in a deep neural network that is trained to predict phonemes on a frame level. The network is trained on a manually transcribed moderate size dataset of Czech television broadcast, but we show that the resulting features generalize well to other languages as well. On a publicly available OuluVS dataset, a result of 91% word accuracy was achieved using vanilla convolutional features, and 97.2% after fine tuning - substantial state of the art improvements in this popular benchmark. Contrary to most of the work on lipreading, we also demonstrate usefulness of the proposed parametrization in the task of continuous audiovisual speech recognition.

机译：我们提出了一种视觉参数化方法，用于从正面人脸视频中进行唇读和视听语音识别。提出的功能利用了深度神经网络中的学习时空卷积，该神经网络经过训练可以在帧级别上预测音素。该网络在捷克电视广播的手动转录的中等大小数据集上进行了训练，但是我们证明，所产生的功能也可以很好地推广到其他语言。在可公开获得的OuluVS数据集上，使用香草卷积功能可实现91％的单词准确度，而在进行微调后可达到97.2％的结果-在该流行基准测试中，现有技术水平已有很大改进。与大多数有关唇读的工作相反，我们还证明了建议的参数化在连续视听语音识别任务中的有用性。

著录项

来源
《International conference on text, speech and dialogue》|2017年|438-446|共9页
会议地点
作者
Karel Paleček;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Audiovisual speech recognition; Deep learning; Spatiotemporal convolutional network; Lipreading;

机译：视听语音识别;深度学习;时空卷积网络唇读;

相似文献

外文文献
中文文献
专利

1. Spatiotemporal fuzzy-graph convolutional network model with dynamic feature encoding for traffic forecasting [J] . Zhang Shuai, Chen Yong, Zhang Wenyu Knowledge-Based Systems . 2021,第Nova14期

机译：具有交通预测动态特征的时空模糊图卷积网络模型
2. Learning 3D spatiotemporal gait feature by convolutional network for person identification [J] . Huynh-The Thien, Hua Cam-Hao, Nguyen Anh Tu, Neurocomputing . 2020,第Jul15期

机译：学习3D时空步态通过卷积网络的人员识别
3. Smoke Vehicle Detection Based on Spatiotemporal Bag-Of-Features and Professional Convolutional Neural Network [J] . Tao Huanjie, Lu Xiaobo IEEE Transactions on Circuits and Systems for Video Technology . 2020,第10期

机译：基于时空袋的烟雾车辆检测和专业卷积神经网络
4. Spatiotemporal Convolutional Features for Lipreading [C] . Karel Palecek International Conference on Text, Speech and Dialogue . 2017

机译：Liledreading的时空卷积功能
5. DeepFakes Detection in Videos Using Feature Engineering Techniques in Deep Learning Convolution Neural Network Frameworks [D] . Burroughs, Sonya. 2021

机译：使用深度学习卷积神经网络框架的特征工程技术在视频中检测视频
6. Violence Detection Using Spatiotemporal Features with 3D Convolutional Neural Network [O] . Fath U Min Ullah, Amin Ullah, Khan Muhammad, 2019

机译：使用时空特征和3D卷积神经网络进行暴力检测
7. Action Recognition Based on Two-Stream Convolutional Networks With Long-Short-Term Spatiotemporal Features [O] . Yanqin Wan, Zujun Yu, Yao Wang, 2020

机译：基于双流卷积网络的行动识别，具有长期短期的时空特征

Spatiotemporal Convolutional Features for Lipreading

摘要

著录项

相似文献

相关主题

期刊订阅