首页> 外文OA文献 >Feature-Fusion based Audio-Visual Speech Recognition using Lip Geometry Features in Noisy Environment
【2h】

Feature-Fusion based Audio-Visual Speech Recognition using Lip Geometry Features in Noisy Environment

机译:在嘈杂环境中使用嘴唇几何特征进行基于特征融合的视听语音识别

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Humans are often able to compensate for noise degradation and uncertainty in speech information by augmenting the received audio with visual information. Such bimodal perception generates a rich combination of information that can be used in the recognition of speech. However, due to wide variability in the lip movement involved in articulation, not all speech can be substantially improved by audio-visual integration. This paper describes a feature-fusion audio-visual speech recognition (AVSR) system that extracts lip geometry from the mouth region using a combination of skin color filter, border following and convex hull, and classification using a Hidden Markov Model. The comparison of the new approach with conventional audio-only system is made when operating under simulated ambient noise conditions that affect the spoken phrases. The experimental results demonstrate that, in the presence of audio noise, the audio-visual approach significantly improves speech recognition accuracy compared with audio-only approach.
机译:人们通常能够通过用视觉信息增强接收到的音频来补偿语音信息中的噪声衰减和不确定性。这种双峰感知产生了可用于语音识别的丰富信息组合。然而,由于咬合中所涉及的唇部运动的广泛变化,因此通过视听整合不能完全改善所有语音。本文介绍了一种功能融合视听语音识别(AVSR)系统,该系统使用肤色滤镜,边界跟随和凸包,以及使用隐马尔可夫模型进行分类来从嘴巴区域提取嘴唇的几何形状。当在影响口语短语的模拟环境噪声条件下运行时,将新方法与常规纯音频系统进行了比较。实验结果表明,在存在音频噪声的情况下,与仅音频方法相比,视听方法显着提高了语音识别精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号