Pre-training of Speaker Embeddings for Low-latency Speaker Change Detection in Broadcast News

机译：广播新闻中低延迟扬声器更改检测的扬声器嵌入式预培训

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work, we investigate pre-training of neural network based speaker embeddings for low-latency speaker change detection. Our proposed system takes two speech segments, generates embeddings using shared Siamese layers and then classifies the concatenated embeddings depending on whether they are spoken by the same speaker. We investigate gender classification, contrastive loss and triplet loss based pre-training of the embedding layers and also joint training of the embedding layers along with a same/different classifier. Training is performed on 2-second single speaker segments based on ground truth speaker segmentation of broadcast news data. However, during test, we use the detection system in a practical low-latency setting for annotating automatic closed captions. In contrast to training, test pairs are now created around automatic speech recognition (ASR) based segmentation boundaries. The ASR segments are often shorter than 2 seconds causing duration mismatch during testing. In our experiments, although the baseline i-vector based classifier performs well, the proposed triplet loss based pre-training followed by joint training provides 7-50% relative F-measure improvement in matched and mismatched conditions. In addition, the degradation in performance is less severe for network based embeddings as compared to using i-vectors in the variable duration test conditions.

机译：在这项工作中，我们调查了对基于神经网络的扬声器嵌入的预训练，用于低延迟扬声器改变检测。我们建议的系统采用两个语音段，使用共享的暹罗层生成嵌入式，然后根据同一扬声器是否由同一扬声器进行分类，分类连接嵌入式。我们研究了基于性别分类，对比损失和三重态损失的基于嵌入层，以及与相同/不同的分类器的嵌入层联合训练。基于广播新闻数据的地面真理扬声器分割，在2秒单个扬声器段执行培训。但是，在测试期间，我们在实际的低延迟设置中使用检测系统，用于注释自动关闭标题。与训练相比，现在在基于自动语音识别（ASR）的分段边界周围创建了测试对。 ASR段通常短于2秒，导致测试期间的持续时间不匹配。在我们的实验中，虽然基线I形式基于矢量的分类器表现良好，但所提出的基于三态损耗的基于三重损失的预训练，然后是联合训练提供了7-50％的相对F测量改善，匹配和错配的条件。此外，与在可变持续时间测试条件中使用I-VICTORS相比，对于基于网络的嵌入性，性能的降级不太严重。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|p5996-6664|共5页
会议地点
作者
Leda Sari; Samuel Thomas; Mark Hasegawa-Johnson; Michael Picheny;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
Speaker change detection; sequence embedding; Siamese networks;

机译：扬声器改变检测;序列嵌入;暹罗网络;

相似文献

外文文献
中文文献
专利

1. Hybridization DE with K-means for speaker clustering in speaker diarization of broadcasts news [J] . Dabbabi Karim, Hajji Salah, Cherif Adnen International journal of speech technology . 2019,第4期

机译：与K-means的混合DE用于演讲者广播新闻的演讲者聚类
2. Development of a Speaker Diarization System for Speaker Tracking in Audio Broadcast News: a Case Study [J] . Mihelic France, Vesnicer Bostjan, Zibert Janez Journal of computing and information technology . 2008,第3期

机译：音频广播新闻中演讲者跟踪的演讲者区分系统的开发：一个案例研究
3. Development Of A Speaker Diarization System For Speaker Tracking In Audio Broadcast News: A Case Study [J] . Janez Zibert, Bostjan Vesnicer, France Mihelic Journal of Computing and Information Technology . 2008,第3期

机译：音频广播新闻中演讲者跟踪的演讲者差异化系统的开发：一个案例研究
4. Pre-training of Speaker Embeddings for Low-latency Speaker Change Detection in Broadcast News [C] . Leda Sarı, Samuel Thomas, Mark Hasegawa-Johnson, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：广播语音中低延迟说话人变化检测的说话人嵌入预训练
5. How local television news has changed over the past thirty years: An examination of the influences of money, technology and personnel choices in the daily broadcast of local TV news. [D] . Conti, Paul. 2005

机译：在过去的30年中，本地电视新闻发生了怎样的变化：考察金钱，技术和人员选择在本地电视新闻的每日广播中的影响。
6. Context‐dependent role of selective attention for change detection in multi‐speaker scenes [O] . Christian Starzynski, Alexander Gutschalk 2018

机译：选择性关注在多说话者场景中检测变化时取决于上下文的作用
7. Speaker Change Detection and Speaker Clustering Using VQ Distortion for Broadcast News Speech Recognition [O] . Kazumasa Mori, Seiichi Nakagawa 2001

机译：利用VQ失真进行广播新闻语音识别的扬声器变化检测和扬声器聚类

Pre-training of Speaker Embeddings for Low-latency Speaker Change Detection in Broadcast News

摘要

著录项

相似文献

相关主题

期刊订阅