Audio-Video Automatic Speech Recognition: An Example of Improved Performance through Multimodal Sensor Input

机译：音频 - 视频自动语音识别：通过多模式传感器输入改进性能的示例

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the advantages of multimodal HCI technology is the performance improvement that can be gained over conventional single-modality technology by employing complementary sensors in different modalities. Such information is particular useful in practical, real-world applications where the application's performance must be robust against all kinds of noise. An example is the domain of automatic speech recognition (ASR). Traditionally, ASR systems only use information from the audio modality. In the presence of acoustic noise, the performance drops quickly. However, it can and has been shown that incorporating additional visual speech information from the video modality improves the performance significantly, so that AV ASR systems can be employed in applications areas where audio-only ASR systems would fail, thus opening new application areas for ASR technology. In this paper, a non-intrusive (no artificial markers), real-time 3D lip tracking system is presented as well as its application to AV ASR. The multivariate statistical analysis 'co-inertia analysis' is also shown, which offers improved numerical stability over other multivariate analyses even for small sample sizes.

机译：多模式HCI技术的优点是通过采用不同模式中的互补传感器来实现传统单模技术的性能改进。此类信息特别有用，在实际的现实应用程序中，应用程序的性能必须对所有类型的噪声具有稳健。一个例子是自动语音识别（ASR）的域。传统上，ASR系统仅使用来自音频模型的信息。在存在声学噪声的情况下，性能快速下降。但是，它可以展望并已显示从视频模型中结合额外的视觉语音信息，显着提高性能，从而可以在Audio-Ops ASR系统将失败的应用领域中使用AV ASR系统，从而为ASR打开新的应用领域技术。本文提出了非侵入式（无人工标记），实时3D唇跟踪系统以及其应用于AV ASR。也显示了多变量统计分析“共惯性分析”，即使对于小样本尺寸，也能提供其他多变量分析的数值稳定性。

著录项

来源
《NICTA-HCSNet Multimodal User Interaction Workshop》|2005年||共8页
会议地点
作者
Roland Goecke;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机的应用;
关键词
audio-video speech processing; 3D stereo lip tracking;

机译：音频 - 视频语音处理;3D立体声唇跟踪;

相似文献

外文文献
中文文献
专利

1. Harmonicity Based Dereverberation for Improving Automatic Speech Recognition Performance and Speech Intelligibility [J] . Keisuke KINOSHITA, Tomohiro NAKATANI, Masato MIYOSHI IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences . 2005,第7期

机译：基于和声的混响技术可提高自动语音识别性能和语音清晰度
2. Using nonlinear modeling of reconstructed phase space and frequency domain analysis to improve automatic speech recognition performance [J] . Jafari A., Almasganj F. International journal of bifurcation and chaos in applied sciences and engineering . 2012,第3期

机译：使用重构相空间的非线性建模和频域分析来提高自动语音识别性能
3. An analysis of the effect of combining standard and alternate sensor signals on recognition of syllabic units for multimodal speech recognition [J] . Radha N., Shahina A., Prabha P., Pattern recognition letters . 2018,第NOVa1期

机译：分析标准和替代传感器信号组合对音节单元识别以进行多模式语音识别的影响
4. Audio-Video Automatic Speech Recognition: An Example of Improved Performance through Multimodal Sensor Input [C] . Roland Goecke NICTA-HCSNet Multimodal User Interaction Workshop 2005(MMUI2005); 200511; Sydney(AU) . 2005

机译：音视频自动语音识别：通过多模式传感器输入提高性能的示例
5. A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques. [D] . Singh, Amriteshwar. 2011

机译：一种使用光学字符识别（OCR）和自动语音识别（ASR）技术的自动邮政地址识别系统的多模式融合方法。
6. The relationship between perceptual disturbances in dysarthric speech and automatic speech recognition performance [O] . Ming Tu, Alan Wisler, Visar Berisha, -1

机译：构音障碍性听觉障碍与自动语音识别性能的关系
7. Improvement of multimodal gesture and speech recognition performance using time intervals between gestures and accompanying speech [O] . 2014

机译：利用手势和伴随语音之间的时间间隔改善多模式手势和语音识别性能

Audio-Video Automatic Speech Recognition: An Example of Improved Performance through Multimodal Sensor Input

摘要

著录项

相似文献

相关主题

期刊订阅