An Attention Based Speaker-Independent Audio-Visual Deep Learning Model for Speech Enhancement

机译：基于注意力的独立于说话者的视听深度学习模型，用于语音增强

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speech enhancement aims to improve speech quality in noisy environments. While most speech enhancement methods use only audio data as input, joining video information can achieve better results. In this paper, we present an attention based speaker-independent audio-visual deep learning model for single channel speech enhancement. We apply both the time-wise attention and spatial attention in the video feature extraction module to focus on more important features. Audio features and video features are then concatenated along the time dimension as the audio-visual features. The proposed video feature extraction module can be spliced to the audio-only model without extensive modifications. The results show that the proposed method can achieve better results than recent audio-visual speech enhancement methods.

机译：语音增强旨在改善嘈杂环境中的语音质量。尽管大多数语音增强方法仅使用音频数据作为输入，但加入视频信息可以获得更好的结果。在本文中，我们提出了一种基于注意力的，与说话者无关的视听深度学习模型，用于单通道语音增强。我们在视频特征提取模块中同时应用了时间注意和空间注意，以关注更重要的功能。然后，沿着时间维度将音频功能部件和视频功能部件并置为视听功能部件。所提出的视频特征提取模块可以在不进行大量修改的情况下被拼接为纯音频模型。结果表明，与最近的视听语音增强方法相比，该方法可以取得更好的效果。

著录项

来源
《International Conference on Multimedia Modeling》|2020年|722-728|共7页
会议地点
作者
Zhongbo Sun; Yannan Wang; Li Cao;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Speech enhancement; Audio-visual; Attention mechanism; Deep learning;

机译：语音增强;视听;注意机制;深度学习;
入库时间 2022-08-26 13:55:05

相似文献

外文文献
中文文献
专利

1. Contextual deep learning-based audio-visual switching for speech enhancement in real-world environments [J] . Information Fusion . 2020,第期

机译：基于语境的基于深度学习的音频视觉切换，用于真实环境中的语音增强
2. Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation [J] . Ephrat Ariel, Mosseri Inbar, Lang Oran, ACM Transactions on Graphics . 2018,第4CD期

机译：期待听鸡尾酒会：独立于演讲者的视听模型，用于语音分离
3. Using Generalized Gaussian Distributions to Improve Regression Error Modeling for Deep Learning-Based Speech Enhancement [J] . Chai Li, Du Jun, Liu Qing-Feng, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2019,第12期

机译：使用广义高斯分布来改善基于深度学习的语音增强的回归误差建模
4. An Attention Based Speaker-Independent Audio-Visual Deep Learning Model for Speech Enhancement [C] . Zhongbo Sun, Yannan Wang, Li Cao International Conference on Multimedia Modeling . 2020

机译：基于注意的扬声器独立视听深度学习模型，用于语音增强
5. Deep Learning Methods for Reverberant and Noisy Speech Enhancement [D] . Zhao, Yan. 2020

机译：混响和嘈杂语音增强的深度学习方法
6. A deep learning model incorporating part of speech and self-matching attention for named entity recognition of Chinese electronic medical records [O] . Xiaoling Cai, Shoubin Dong, Jinlong Hu 2019

机译：结合语音和自我匹配注意力的深度学习模型用于中国电子病历的命名实体识别
7. An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation [O] . Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, 2021

机译：基于深度学习的视听语音增强和分离概述

An Attention Based Speaker-Independent Audio-Visual Deep Learning Model for Speech Enhancement

摘要

著录项

相似文献

相关主题

期刊订阅