Speech Enhancement Based on Bayesian Low-Rank and Sparse Decomposition of Multichannel Magnitude Spectrograms

Yoshiaki Bando; Katsutoshi Itoyama; Masashi Konyo; Satoshi Tadokoro; Kazuhiro Nakadai; Kazuyoshi Yoshii; Tatsuya Kawahara; Hiroshi G. Okuno

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Speech Enhancement Based on Bayesian Low-Rank and Sparse Decomposition of Multichannel Magnitude Spectrograms

【24h】

Speech Enhancement Based on Bayesian Low-Rank and Sparse Decomposition of Multichannel Magnitude Spectrograms

机译：基于多通道幅度谱的贝叶斯低秩和稀疏分解的语音增强

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper presents a blind multichannel speech enhancement method that can deal with the time-varying layout of microphones and sound sources. Since nonnegative tensor factorization (NTF) separates a multichannel magnitude (or power) spectrogram into source spectrograms without phase information, it is robust against the time-varying mixing system. This method, however, requires prior information such as the spectral bases (templates) of each source spectrogram in advance. To solve this problem, we develop a Bayesian model called robust NTF (Bayesian RNTF) that decomposes a multichannel magnitude spectrogram into target speech and noise spectrograms based on their sparseness and low rankness. Bayesian RNTF is applied to the challenging task of speech enhancement for a microphone array distributed on a hose-shaped rescue robot. When the robot searches for victims under collapsed buildings, the layout of the microphones changes over time and some of them often fail to capture target speech. Our method robustly works under such situations, thanks to its characteristic of time-varying mixing system. Experiments using a 3-m hose-shaped rescue robot with eight microphones show that the proposed method outperforms conventional blind methods in enhancement performance by the signal-to-noise ratio of 1.03 dB.

机译：本文提出了一种盲多通道语音增强方法，可以处理麦克风和声源随时间变化的布局。由于非负张量因子分解（NTF）将多通道幅值（或功率）频谱图分离为没有相位信息的源频谱图，因此它对于时变混合系统具有较强的鲁棒性。但是，该方法需要事先提供先验信息，例如每个源谱图的谱库（模板）。为解决此问题，我们开发了一种称为鲁棒NTF（贝叶斯RNTF）的贝叶斯模型，该模型基于它们的稀疏性和低秩将多通道幅度谱图分解为目标语音和噪声谱图。贝叶斯RNTF被应用于语音增强的挑战性任务，该增强是针对分布在软管形救援机器人上的麦克风阵列的。当机器人在倒塌的建筑物下搜索受害者时，麦克风的布局会随着时间而变化，其中一些麦克风通常无法捕获目标语音。由于其时变混合系统的特性，我们的方法在这种情况下能可靠地工作。使用带有8个麦克风的3 m软管形救援机器人进行的实验表明，该方法在信噪比为1.03 dB方面优于传统的盲法。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2018年第2期|215-230|共16页
作者
Yoshiaki Bando; Katsutoshi Itoyama; Masashi Konyo; Satoshi Tadokoro; Kazuhiro Nakadai; Kazuyoshi Yoshii; Tatsuya Kawahara; Hiroshi G. Okuno;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Spectrogram; Bayes methods; Speech enhancement; Speech; Microphones; Robustness; Robots;

机译：频谱图;贝叶斯方法;语音增强;语音;麦克风;健壮性;机器人;

相似文献

外文文献
中文文献
专利

1. Speech Enhancement Based on Constrained Low-rank Sparse Matrix Decomposition Integrated with Temporal Continuity Regularisation [J] . Chengli SUN, Conglin YUAN Archives of acoustics . 2019,第4期

机译：基于约束低秩稀疏矩阵分解和时间连续性正则化的语音增强
2. A Signal Subspace Speech Enhancement Approach Based on Joint Low-Rank and Sparse Matrix Decomposition [J] . Sun Chengli, Xie Jianxiao, Leng Yan Archives of acoustics . 2016,第2期

机译：基于联合低秩和稀疏矩阵分解的信号子空间语音增强方法
3. Speech Denoising in White Noise Based on Signal Subspace Low-rank Plus Sparse Decomposition [J] . Shuai yuan, Cheng-li Sun MATEC Web of Conferences . 2017,第1期

机译：基于信号子空间低级别加稀疏分解的白噪声去噪
4. Speech enhancement by sparse, low-rank, and dictionary spectrogram decomposition [C] . Chen Zhuo, Ellis Daniel P.W. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics . 2013

机译：通过稀疏，低秩和字典频谱图分解来增强语音
5. On Computing Sparse Generalized Inverses and Sparse-Inverse/Low-Rank Decompositions [D] . ?Fuentes, Victor K. 2019

机译：计算稀疏概括的逆和稀疏 - 逆/低秩分解
6. A Hierarchical Bayesian Approach for Learning Sparse Spatio-Temporal Decomposition of Multichannel EEG [O] . Wei Wu, Zhe Chen, Shangkai Gao, -1

机译：一种学习稀疏时空分解多通道脑电图的分层贝叶斯方法
7. SPEECH ENHANCEMENT BY SPARSE, LOW-RANK, AND DICTIONARY SPECTROGRAM DECOMPOSITION [O] . Zhuo Chen, Daniel P. W. Ellis 2014

机译：通过稀疏，低阶和字典式光谱分解增强语音

Speech Enhancement Based on Bayesian Low-Rank and Sparse Decomposition of Multichannel Magnitude Spectrograms

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅