A Multi-Phase Gammatone Filterbank for Speech Separation Via Tasnet

机译：通过Tasnet进行语音分离的多相Gammatone滤波器组

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work, we investigate if the learned encoder of the end-to-end convolutional time domain audio separation network (Conv-TasNet) is the key to its recent success, or if the encoder can just as well be replaced by a deterministic hand-crafted filterbank. Motivated by the resemblance of the trained encoder of Conv-TasNet to auditory filterbanks, we propose to employ a deterministic gammatone filterbank. In contrast to a common gammatone filterbank, our filters are restricted to 2 ms length to allow for low-latency processing. Inspired by the encoder learned by Conv-TasNet, in addition to the logarithmically spaced filters, the proposed filterbank holds multiple gammatone filters at the same center frequency with varying phase shifts. We show that replacing the learned encoder with our proposed multi-phase gammatone filterbank (MP-GTF) even leads to a scale-invariant source-to-noise ratio (SI-SNR) improvement of 0.7 dB. Furthermore, in contrast to using the learned encoder we show that the number of filters can be reduced from 512 to 128 without loss of performance.

机译：在这项工作中，我们研究端到端卷积时域音频分离网络（Conv-TasNet）的学习编码器是否是其近期成功的关键，或者是否也可以用确定性的手代替编码器精心打造的滤网。由于受过训练的Conv-TasNet编码器与听觉滤波器组的相似性，我们建议采用确定性的伽马通滤波器组。与常见的Gammatone滤波器组相反，我们的滤波器的长度限制为2 ms，以便进行低延迟处理。受Conv-TasNet学习的编码器的启发，除了对数间隔的滤波器外，拟议的滤波器组还在相同的中心频率上具有多个相移变化的伽马通滤波器。我们证明，用我们提出的多相伽马通滤波器组（MP-GTF）代替学习型编码器，甚至可以将标度不变的信噪比（SI-SNR）提高0.7 dB。此外，与使用学习的编码器相反，我们表明可以将滤波器的数量从512减少到128，而不会降低性能。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2020年|36-40|共5页
会议地点
作者
David Ditter; Timo Gerkmann;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Speech Separation; Auditory Filterbank; End-To-End Learning; TasNet;

机译：语音分离;听觉滤波器库;端到端学习; TasNet;

相似文献

外文文献
中文文献
专利

1. Low rank sparse decomposition model based speech enhancement using gammatone filterbank and Kullback-Leibler divergence [J] . Nasir Saleem, Gohar Ijaz International journal of speech technology . 2018,第2期

机译：基于低秩稀疏分解模型的基于伽马通滤波器组和Kullback-Leibler发散的语音增强
2. Whispered Speech Recognition Based on Gammatone Filterbank Cepstral Coefficients [J] . Markovic B., Galic J., Grozdic D., NTT R&D . 2017,第11期

机译：基于伽玛通滤波器组倒谱系数的耳语语音识别
3. Implementation of Real-Time Speech Separation Model Using Time-Domain Audio Separation Network (TasNet) and Dual-Path Recurrent Neural Network (DPRNN) [J] . Alfian Wijayakusuma, Davin Reinaldo Gozali, Anthony Widjaja, Procedia Computer Science . 2021,第1期

机译：使用时域音频分离网络（TASNET）和双路径复制神经网络（DPRNN）实现实时语音分离模型
4. TASNET: TIME-DOMAIN AUDIO SEPARATION NETWORK FOR REAL-TIME, SINGLE-CHANNEL SPEECH SEPARATION [C] . Yi Luo, Nima Mesgarani IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：TASNet：时间域音频分离网络实时，单通道语音分离
5. Improving the quality of low bitrate LPC speech codec using gamma-chirp filterbank. [D] . Khajeh Djahromi, Amin. 2005

机译：使用伽马线性调频滤波器组提高低比特率LPC语音编解码器的质量。
6. Investigating the use of a Gammatone filterbank for a cochlear implant coding strategy [O] . Sonia Tabibi, Andrea Kegel, Wai Kong Lai, -1

机译：研究使用Gammatone滤波器组进行人工耳蜗编码策略
7. A BIO-INSPIRED SOUND SOURCE SEPARATION TECHNIQUE IN COMBINATION WITH AN ENHANCED FIR GAMMATONE ANALYSIS/SYNTHESIS FILTERBANK [O] . Pichevar Ramin, Rouat Jean, Feldbauer Christian, 2004

机译：生物激发的声源分离技术与增强型FIR游戏激素分析/合成滤池的结合

A Multi-Phase Gammatone Filterbank for Speech Separation Via Tasnet

摘要

著录项

相似文献

相关主题

期刊订阅