首页> 外文会议>International Conference on Speech and Computer >Creation and Selection of the Visual Front End Features and the Audio-Visual Feature Fusion for Audio-Visual Speech Recognition

【24h】

Creation and Selection of the Visual Front End Features and the Audio-Visual Feature Fusion for Audio-Visual Speech Recognition

机译：创建和选择视觉前端功能和视听语音识别的视听功能融合

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This contribution is about a creation and selection of the visual front end speech features. The use of the visual shape and the appearance-based visual features are described here. These visual features can be used for the visual or for the audiovisual speech recognition. Before they are used, the features have to be normalized and selected in such a way, so that the recognition rate was high enough. The second task has been the use of the fusion of different kinds of visual and acoustic speech features. The experiments for the audio-visual recognition of isolated words have been created in the conclusion of this work.

机译：此贡献是关于创建和选择视觉前端语音功能。这里描述了使用视觉形状和基于外观的视觉特征。这些可视特征可用于视觉或视听语音识别。在使用之前，必须以这种方式标准化并选择特征，使得识别率足够高。第二任务一直是使用不同类型的视觉和声学语音功能的融合。在这项工作的结论中，已经创建了对孤立词语的视听识别的实验。

著录项

来源
《International Conference on Speech and Computer》|2006年||共4页
会议地点
作者
Josef Chaloupka;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN911-53;
关键词

相似文献

外文文献
中文文献
专利

1. Audio-visual feature fusion via deep neural networks for automatic speech recognition [J] . Mohammad Hasan Rahmani, Farshad Almasganj, Seyyed Ali Seyyedsalehi Digital Signal Processing . 2018,第期

机译：通过深度神经网络进行视听功能融合，用于自动语音识别
2. Audio-visual feature fusion via deep neural networks for automatic speech recognition [J] . Mohammad Hasan Rahmani, Farshad Almasganj, Seyyed Ali Seyyedsalehi Digital Signal Processing . 2018,第期

机译：通过深度神经网络进行视听功能融合，用于自动语音识别
3. Omnidirectional Audio-Visual Talker Localization Based on Dynamic Fusion of Audio-Visual Features Using Validity and Reliability Criteria [J] . Yuki DENDA, Takanobu NISHIURA, Yoichi YAMASHITA IEICE Transactions on Information and Systems . 2008,第3期

机译：基于有效性和可靠性准则的视听特征动态融合的全向视听讲话者定位
4. Creation and Selection of the Visual Front End Features and the Audio-Visual Feature Fusion for Audio-Visual Speech Recognition [C] . Josef Chaloupka International Conference on Speech and Computer . 2006

机译：创建和选择视觉前端功能和视听语音识别的视听功能融合
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Particle Swarm Optimization Based Feature Enhancement and Feature Selection for Improved Emotion Recognition in Speech and Glottal Signals [O] . Hariharan Muthusamy, Kemal Polat, Sazali Yaacob -1

机译：基于粒子群优化的特征增强和特征选择用于语音和声门信号中的情感识别
7. Feature-Fusion based Audio-Visual Speech Recognition using Lip Geometry Features in Noisy Environment [O] . M. Z. Ibrahim, Mulvaney D. J., M. F. Abas 2015

机译：在嘈杂环境中使用嘴唇几何特征进行基于特征融合的视听语音识别

Creation and Selection of the Visual Front End Features and the Audio-Visual Feature Fusion for Audio-Visual Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅