首页> 外文会议>International Conference on Multimedia Modeling >An Attention Based Speaker-Independent Audio-Visual Deep Learning Model for Speech Enhancement
【24h】

An Attention Based Speaker-Independent Audio-Visual Deep Learning Model for Speech Enhancement

机译:基于注意力的独立于说话者的视听深度学习模型,用于语音增强

获取原文

摘要

Speech enhancement aims to improve speech quality in noisy environments. While most speech enhancement methods use only audio data as input, joining video information can achieve better results. In this paper, we present an attention based speaker-independent audio-visual deep learning model for single channel speech enhancement. We apply both the time-wise attention and spatial attention in the video feature extraction module to focus on more important features. Audio features and video features are then concatenated along the time dimension as the audio-visual features. The proposed video feature extraction module can be spliced to the audio-only model without extensive modifications. The results show that the proposed method can achieve better results than recent audio-visual speech enhancement methods.
机译:语音增强旨在改善嘈杂环境中的语音质量。尽管大多数语音增强方法仅使用音频数据作为输入,但加入视频信息可以获得更好的结果。在本文中,我们提出了一种基于注意力的,与说话者无关的视听深度学习模型,用于单通道语音增强。我们在视频特征提取模块中同时应用了时间注意和空间注意,以关注更重要的功能。然后,沿着时间维度将音频功能部件和视频功能部件并置为视听功能部件。所提出的视频特征提取模块可以在不进行大量修改的情况下被拼接为纯音频模型。结果表明,与最近的视听语音增强方法相比,该方法可以取得更好的效果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号