SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker’s Voice Characteristics

Marc Delcroix; Katerina Zmolikova; Keisuke Kinoshita; Shoko Araki; Atsunori Ogawa; Tomohiro Nakatani

首页> 外文期刊>NTT Technical Review >SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker’s Voice Characteristics

【24h】

SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker’s Voice Characteristics

机译：SpeakerBeam：一种新的深度学习技术，用于根据说话者的语音特征提取目标说话者的语音

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In a noisy environment such as a cocktail party, humans can focus on listening to a desired speaker, an ability known as selective hearing. Current approaches developed to realize computational selective hearing require knowing the position of the target speaker, which limits their practical usage. This article introduces SpeakerBeam, a deep learning based approach for computational selective hearing based on the characteristics of the target speaker’s voice. SpeakerBeam requires only a small amount of speech data from the target speaker to compute his/her voice characteristics. It can then extract the speech of that speaker regardless of his/her position or the number of speakers talking in the background.

机译：在嘈杂的环境（例如鸡尾酒会）中，人们可以专注于聆听所需的说话者，这种能力称为选择性听力。为实现计算选择性听力而开发的当前方法需要知道目标说话者的位置，这限制了他们的实际使用。本文介绍了SpeakerBeam，这是一种基于深度学习的方法，可根据目标说话人语音的特征进行计算选择性听力。 SpeakerBeam仅需要来自目标说话者的少量语音数据即可计算其语音特性。然后，无论他/她的位置或在后台讲话的讲话者数量如何，它都可以提取该讲话者的语音。

著录项

来源
《NTT Technical Review》 |2018年第11期|共6页
作者
Marc Delcroix; Katerina Zmolikova; Keisuke Kinoshita; Shoko Araki; Atsunori Ogawa; Tomohiro Nakatani;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类通信;
关键词
deep learningtarget speaker extractionSpeakerBeam;

机译：深度学习目标说话人提取说话人波束;
入库时间 2022-08-18 12:07:14

相似文献

外文文献
中文文献
专利

1. SpeakerBeam: Speaker Aware Neural Network for Target Speaker Extraction in Speech Mixtures [J] . Zmolikova Katerina, Delcroix Marc, Kinoshita Keisuke, Selected Topics in Signal Processing, IEEE Journal of . 2019,第4期

机译：SpeakerBeam：用于语音混合中目标说话人提取的说话人感知神经网络
2. An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis [J] . Beáta L?rincz, Adriana Stan, Mircea Giurgiu Procedia Computer Science . 2021,第a期

机译：对多扬声器深神经动词合成中记录条件和扬声器特性的客观评价
3. Speech utterance clustering based on the maximization of within-cluster homogeneity of speaker voice characteristics [J] . Tsai WH, Wang HM The Journal of the Acoustical Society of America . 2006,第3期

机译：基于说话人语音特征簇内同质性最大化的语音发声聚类
4. Speakerfilter: Deep Learning-Based Target Speaker Extraction Using Anchor Speech [C] . Shulin He, Hao Li, Xueliang Zhang IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：Speakerfilter：使用锚语音提取基于深度学习的目标说话人
5. Deep learning for speech classification and speaker recognition [D] . Saleem, Muhammad Muneeb. 2014

机译：深度学习用于语音分类和说话人识别
6. Attractiveness and distinctiveness between speakers voices in naturalistic speech and their faces are uncorrelated [O] . Romi Zäske, Verena Gabriele Skuk, Stefan R. Schweinberger 2020

机译：扬声器在自然主义语音和脸部的声音之间的吸引力和独特性是不相关的
7. Speech Utterance clustering based on the maximization of within-cluster homogeneity of speaker voice characteristics,” The [O] . Wei-ho Tsai, Hsin-min Wang 2015

机译：基于说话人语音特征的群内同质性最大化的语音话语聚类，“The

SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker’s Voice Characteristics

摘要

著录项

相似文献

相关主题

期刊订阅