SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

Zmolikova Katerina; Delcroix Marc; Kinoshita Keisuke; Ochiai Tsubasa; Nakatani Tomohiro; Burget Lukas; Cernocky Jan

首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

【24h】

SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

机译：SpeakerBeam：用于语音混合中目标说话人提取的说话人感知神经网络

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The processing of speech corrupted by interfering overlapping speakers is one of the challenging problems with regards to today's automatic speech recognition systems. Recently, approaches based on deep learning have made great progress toward solving this problem. Most of these approaches tackle the problem as speech separation, i.e., they blindly recover all the speakers from the mixture. In some scenarios, such as smart personal devices, we may however be interested in recovering one target speaker from a mixture. In this paper, we introduce Speaker-Beam, a method for extracting a target speaker from the mixture based on an adaptation utterance spoken by the target speaker. Formulating the problem as speaker extraction avoids certain issues such as label permutation and the need to determine the number of speakers in the mixture. With SpeakerBeam, we jointly learn to extract a representation from the adaptation utterance characterizing the target speaker and to use this representation to extract the speaker. We explore several ways to do this, mostly inspired by speaker adaptation in acoustic models for automatic speech recognition. We evaluate the performance on the widely used WSJ0-2mix and WSJ0-3mix datasets, and these datasets modified with more noise or more realistic overlapping patterns. We further analyze the learned behavior by exploring the speaker representations and assessing the effect of the length of the adaptation data. The results show the benefit of including speaker information in the processing and the effectiveness of the proposed method.

机译：对于当今的自动语音识别系统而言，由于重叠的扬声器受到干扰而导致的语音处理问题是一个具有挑战性的问题。最近，基于深度学习的方法在解决这个问题上取得了长足的进步。这些方法中的大多数解决了语音分离时的问题，即，它们盲目地从混合中恢复了所有说话者。但是，在某些情况下，例如智能个人设备，我们可能有兴趣从混音中恢复一位目标说话者。在本文中，我们介绍了Speaker-Beam，这是一种根据目标说话者说出的自适应话语从混合物中提取目标说话者的方法。将问题表示为说话人提取可以避免某些问题，例如标签排列以及确定混合物中说话人数量的需要。借助SpeakerBeam，我们共同学习从表征目标说话者的适应话语中提取一个表示，并使用该表示来提取说话者。我们探索了几种方法来实现此目的，这些方法主要是受声学模型中说话人自适应的启发而实现的，这些模型可以自动识别语音。我们在广泛使用的WSJ0-2mix和WSJ0-3mix数据集上评估了性能，这些数据集使用更多噪声或更实际的重叠模式进行了修改。我们通过探讨说话者的表现形式并评估适应数据长度的影响来进一步分析学习的行为。结果表明，在处理过程中包括说话人信息的好处以及所提方法的有效性。

著录项

来源
《Selected Topics in Signal Processing, IEEE Journal of》 |2019年第4期|800-814|共15页
作者
Zmolikova Katerina; Delcroix Marc; Kinoshita Keisuke; Ochiai Tsubasa; Nakatani Tomohiro; Burget Lukas; Cernocky Jan;
展开▼
作者单位

Brno Univ Technol Speech FIT Brno 60190 Czech Republic;

NTT Corp NTT Commun Sci Labs Kyoto 6190237 Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Speaker extraction; speaker-aware neural network; multi-speaker speech recognition;

机译：说话者提取;说话者感知神经网络多说话人语音识别;

相似文献

外文文献
中文文献
专利

1. SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker’s Voice Characteristics [J] . Marc Delcroix, Katerina Zmolikova, Keisuke Kinoshita, NTT Technical Review . 2018,第11期

机译：SpeakerBeam：一种新的深度学习技术，用于根据说话者的语音特征提取目标说话者的语音
2. TEnet: target speaker extraction network with accumulated speaker embedding for automatic speech recognition [J] . Li Wenjie, Zhang Pengyuan, Yan Yonghong Electronics Letters . 2019,第14期

机译：TEnet：目标说话人提取网络，具有累积的说话人嵌入功能，可自动识别语音
3. Speaker/Style-Dependent Neural Network Speech Synthesis Based on Speaker/Style Embedding [J] . Milan Se?ujski, Darko Pekar, Sini?a Suzi?, Journal of Universal Computer Science . 2020,第4期

机译：基于扬声器/风格嵌入的扬声器/型依赖神经网络语音合成
4. Improving Speaker Discrimination of Target Speech Extraction With Time-Domain Speakerbeam [C] . Marc Delcroix, Tsubasa Ochiai, Katerina Zmolikova, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：时域扬声器束改善目标语音提取的扬声器辨别力
5. Convolutional Neural Networks for Speaker-Independent Speech Recognition. [D] . Belilovsky, Eugene. 2011

机译：用于与说话人无关的语音识别的卷积神经网络。
6. Speaker-Independent Silent Speech Recognition from Flesh-Point Articulatory Movements Using an LSTM NeuralNetwork [O] . Myungjong Kim, Beiming Cao, Ted Mau, -1

机译：使用LSTM神经从肉点发音运动中独立于说话者的沉默语音识别网络
7. Speaker-Aware Training of Attention-Based End-to-End Speech Recognition Using Neural Speaker Embeddings [O] . Aku Rouhe, Tuomas Kaseva, Mikko Kurimo 2020

机译：使用神经扬声器嵌入的扬声器感知注意力的关注结束语音识别

SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅