Multimodal systems for speech recognition

Orken Zh. Mamyrbayev; Keylan Alimhan; Beibut Amirgaliyev; Bagashar Zhumazhanov; Dinara Mussayeva; Farida Gusmanova

首页> 外文期刊>International Journal of Mobile Communications >Multimodal systems for speech recognition

【24h】

Multimodal systems for speech recognition

机译：语音识别的多模式系统

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this article, we have implemented a system of multimodal recognition of Kazakh speech, based on speech and lip recognition. During the feature extraction phase, several methods have been used, such as voice activity detection (VAD), mel-frequency cepstral coefficients, perceptual linear prediction, relative perceptual linear prediction, and their first-order time derivatives. The main problems of recognition of Kazakh speech, VAD algorithms and speech segmentation, lip movement recognition are considered in the article. The description of probabilistic modelling of audiovisual speech based on coupled hidden Markov models (HMMs), information fusion methods with weight coefficients for audio and video speech modalities, and parametric representation of signals is provided. Quantitative results in multimodal recognition of continuous Kazakh speech indicate high accuracy and reliability of the automatic system. This approach has been used and compared in terms of computational time and recognition speed and gives very interesting results.

机译：在本文中，我们基于言语和嘴唇识别实施了哈萨克斯言论的多式式识别系统。在特征提取阶段期间，已经使用了几种方法，例如语音活动检测（VAD），熔融频率谱系齐，感知线性预测，相对感知线性预测，以及它们的一阶时间衍生物。在文章中考虑了哈萨克语演讲，VAD算法和语音分割，唇部运动识别的主要问题。基于耦合隐马尔可夫模型（HMMS）的视听语音概率建模的描述，提供了音频和视频语音模态的权重系数的信息融合方法，以及信号的参数表示。连续哈萨克语言论多式识别的定量结果表明了自动系统的高精度和可靠性。在计算时间和识别速度方面已经使用并比较了这种方法，并提供了非常有趣的结果。

著录项

来源
《International Journal of Mobile Communications》 |2020年第3期|314-326|共13页
作者
Orken Zh. Mamyrbayev; Keylan Alimhan; Beibut Amirgaliyev; Bagashar Zhumazhanov; Dinara Mussayeva; Farida Gusmanova;
展开▼
作者单位

Institute of Information and Computational Technologies CS MES RK;

Tokyo Denki University;

Institute of Information and Computational Technologies CS MES RK;

Institute of Information and Computational Technologies CS MES RK;

Institute of Economy CS MES RK;

Al-Farabi Kazakh National University;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
voice activity detection; VAD; speech segmentation; multimodal systems; speech recognition; information systems;

机译：语音活动检测;VAD;语音分割;多模式系统;语音识别;信息系统;

相似文献

外文文献
中文文献
专利

1. Noise robust speech recognition system using multimodal audio-visual approach using different deep learning classification techniques [J] . Eslam E. El Maghraby, Amr M. Gody, Mohamed Hesham Farouk International Journal of Advanced Computer Research . 2020,第47期

机译：利用不同深度学习分类技术，使用多模式视听方法的噪声强大语音识别系统
2. An automatic multimodal speech recognition system with audio and video information [J] . Karpov A. A. Automation and Remote Control . 2014,第12期

机译：具有音频和视频信息的自动多模式语音识别系统
3. Statistical speech translation system based on voice recognition optimization using multimodal sources of knowledge and characteristics vectors [J] . Alejandro Canovas, Jesus Tomas, Jaime Lloret, Computer standards & interfaces . 2013,第5期

机译：基于语音识别优化的统计语音翻译系统，使用多模态知识源和特征向量
4. Enhancing quality and accuracy of speech recognition system by using multimodal audio-visual speech signal [C] . Eslam E. El Maghraby, Amr M. Gody, M. Hesham Farouk International Computer Engineering Conference . 2016

机译：利用多模态视听语音信号提高语音识别系统的质量和准确性
5. A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques. [D] . Singh, Amriteshwar. 2011

机译：一种使用光学字符识别（OCR）和自动语音识别（ASR）技术的自动邮政地址识别系统的多模式融合方法。
6. A Multimodal Speech Capture System for Speech Rehabilitation and Learning [O] . Nordine Sebkhi, Dhyey Desai, Mohammad Islam, -1

机译：用于语音康复和学习的多模式语音捕获系统
7. Noise-Robust Speech Recognition System based on Multimodal Audio-Visual Approach Using Different Deep Learning Classification Techniques [O] . Eslam ElMaghraby, Amr Gody, Mohamed Farouk 2020

机译：基于不同深度学习分类技术的多模式视听方法的噪声鲁棒语音识别系统

Multimodal systems for speech recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅