Using a High-Speed Video Camera for Robust Audio-Visual Speech Recognition in Acoustically Noisy Conditions

机译：使用高速摄像机在声噪条件下进行可靠的视听语音识别

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The purpose of this study is to develop a robust audio-visual speech recognition system and to investigate the influence of a high-speed video data on the recognition accuracy of continuous Russian speech under different noisy conditions. Developed experimental setup and collected multimodal database allow us to explore the impact brought by the high-speed video recordings with various frames per second (fps) starting from standard 25 fps up to high-speed 200 fps. At the moment there is no research objectively reflecting the dependence of the speech recognition accuracy from the video frame rate. Also there are no relevant audio-visual databases for model training. In this paper, we try to fill in this gap for continuous Russian speech. Our evaluation experiments show the increase of absolute recognition accuracy up to 3% and prove that the use of the high-speed camera JAI Pulnix with 200 fps allows achieving better recognition results under different acoustically noisy conditions.

机译：这项研究的目的是开发一个强大的视听语音识别系统，并研究高速视频数据对不同噪声条件下连续俄罗斯语音的识别精度的影响。开发的实验设置和收集的多模式数据库使我们能够探索从标准25 fps到高速200 fps的各种每秒帧数（fps）的高速视频记录所带来的影响。目前，还没有研究客观地反映语音识别精度与视频帧速率之间的关系。也没有用于模型训练的相关视听数据库。在本文中，我们试图填补这一空白，以使俄罗斯人能够连续发表演讲。我们的评估实验表明，绝对识别精度提高了3％，并证明了使用200 fps的高速相机JAI Pulnix可以在不同的声学噪声条件下获得更好的识别结果。

著录项

来源
《International Conference on speech and computer》|2017年|757-766|共10页
会议地点
作者
Denis Ivanko; Alexey Karpov; Dmitry Ryumin; Irina Kipyatkova; Anton Saveliev; Victor Budkov; Dmitriy Ivanko; Milos Zelezny;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Audio-visual speech recognition; High-speed video camera; Noisy conditions; Russian speech; Visemes; Multimodal communication;

机译：视听语音识别;高速摄像机;嘈杂的条件;俄语演讲;视位;多模式通讯;

相似文献

外文文献
中文文献
专利

1. Robust Audio-Visual Speech Recognition Under Noisy Audio-Video Conditions [J] . Stewart D., Seymour R., Pass A., Cybernetics, IEEE Transactions on . 2014,第2期

机译：嘈杂的视听条件下的鲁棒视听语音识别
2. Multiple cameras for audio-visual speech recognition in an automotive environment [J] . Rajitha Navarathna, David Dean, Sridha Sridharan, Computer speech and language . 2013,第4期

机译：多个摄像头，用于汽车环境中的视听语音识别
3. Robust Speech Recognition System Using Conventional and Hybrid Features of MFCC, LPCC, PLP, RASTA-PLP and Hidden Markov Model Classifier in Noisy Conditions [J] . Veton Z. K?puska, Hussien A. Elharati Journal of Computer and Communications . 2015,第6期

机译：噪声条件下使用MFCC，LPCC，PLP，RASTA-PLP和隐马尔可夫模型分类器的常规和混合特征的鲁棒语音识别系统
4. Using a High-Speed Video Camera for Robust Audio-Visual Speech Recognition in Acoustically Noisy Conditions [C] . Denis Ivanko, Alexey Karpov, Dmitry Ryumin, International Conference on Speech and Computer . 2017

机译：在声音嘈杂的条件下使用高速摄像机进行强大的视听语音语音识别
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Robust EEG-Based Decoding of Auditory Attention With High-RMS-Level Speech Segments in Noisy Conditions [O] . Lei Wang, Ed X. Wu, Fei Chen 2020

机译：基于危险的eeg的eeg的解码在嘈杂的条件下具有高rms级语音段的听觉注意力
7. Robust Audio-Visual Speech Recognition under Noisy Audio-Video Conditions [O] . Stewart, D., Seymour, R., Pass, A., 2014

机译：嘈杂的视听条件下的鲁棒视听语音识别

Using a High-Speed Video Camera for Robust Audio-Visual Speech Recognition in Acoustically Noisy Conditions

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅