首页> 外文会议>International Conference on Intelligent Autonomous Systems >Fuzzy Neural Network with Audio-Visual Data for Voice Activity Detection in Noisy Environments
【24h】

Fuzzy Neural Network with Audio-Visual Data for Voice Activity Detection in Noisy Environments

机译:视听数据的模糊神经网络,用于嘈杂环境中的语音活动检测

获取原文

摘要

Voice activity detection is a fundamental problem in speech processing, which has been discussed for decades. However, it is a big challenge to determine the speech boundary in noisy environments because the corrupted speech is uncertain. In handing problems with noisy data, this study adopts a fuzzy neural network (FNN) to process the uncertainty. Furthermore, human speech perception is bimodal. We lip-read in noisy environments to improve intelligibility. This idea inspires us to adopt the visual information into the voice activity detection system. Based on the skin color segmentation, faces and mouths can be found in images. By analyzing the geometric shapes, the lip contour feature of speaker can be extracted. Then, the proposed fuzzy neural network considers not only audio but also visual information. Compared with the other voice activity detection, the proposed method for voice activity detection is more robust in the condition of low signal-to-noise ratio (SNR).
机译:语音活动检测是语音处理中的一个基本问题,已经讨论了数十年。然而,在嘈杂的环境中确定语音边界是一个很大的挑战,因为损坏的语音是不确定的。在处理噪声数据问题时,本研究采用模糊神经网络(FNN)处理不确定性。此外,人类语音感知是双峰的。我们在嘈杂的环境中进行唇读,以提高清晰度。这个想法激发我们将视觉信息引入语音活动检测系统。根据肤色分割,可以在图像中找到脸和嘴。通过分析几何形状,可以提取扬声器的嘴唇轮廓特征。然后,提出的模糊神经网络不仅考虑音频,还考虑视觉信息。与其他语音活动检测相比,本文提出的语音活动检测方法在信噪比低的情况下更加鲁棒。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号