首页> 外文会议>NICTA-HCSNet Multimodal User Interaction Workshop >Audio-Video Automatic Speech Recognition: An Example of Improved Performance through Multimodal Sensor Input
【24h】

Audio-Video Automatic Speech Recognition: An Example of Improved Performance through Multimodal Sensor Input

机译:音频 - 视频自动语音识别:通过多模式传感器输入改进性能的示例

获取原文

摘要

One of the advantages of multimodal HCI technology is the performance improvement that can be gained over conventional single-modality technology by employing complementary sensors in different modalities. Such information is particular useful in practical, real-world applications where the application's performance must be robust against all kinds of noise. An example is the domain of automatic speech recognition (ASR). Traditionally, ASR systems only use information from the audio modality. In the presence of acoustic noise, the performance drops quickly. However, it can and has been shown that incorporating additional visual speech information from the video modality improves the performance significantly, so that AV ASR systems can be employed in applications areas where audio-only ASR systems would fail, thus opening new application areas for ASR technology. In this paper, a non-intrusive (no artificial markers), real-time 3D lip tracking system is presented as well as its application to AV ASR. The multivariate statistical analysis 'co-inertia analysis' is also shown, which offers improved numerical stability over other multivariate analyses even for small sample sizes.
机译:多模式HCI技术的优点是通过采用不同模式中的互补传感器来实现传统单模技术的性能改进。此类信息特别有用,在实际的现实应用程序中,应用程序的性能必须对所有类型的噪声具有稳健。一个例子是自动语音识别(ASR)的域。传统上,ASR系统仅使用来自音频模型的信息。在存在声学噪声的情况下,性能快速下降。但是,它可以展望并已显示从视频模型中结合额外的视觉语音信息,显着提高性能,从而可以在Audio-Ops ASR系统将失败的应用领域中使用AV ASR系统,从而为ASR打开新的应用领域技术。本文提出了非侵入式(无人工标记),实时3D唇跟踪系统以及其应用于AV ASR。也显示了多变量统计分析“共惯性分析”,即使对于小样本尺寸,也能提供其他多变量分析的数值稳定性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号