Pre-training of Speaker Embeddings for Low-latency Speaker Change Detection in Broadcast News

机译：广播语音中低延迟说话人变化检测的说话人嵌入预训练

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this work, we investigate pre-training of neural network based speaker embeddings for low-latency speaker change detection. Our proposed system takes two speech segments, generates embeddings using shared Siamese layers and then classifies the concatenated embeddings depending on whether they are spoken by the same speaker. We investigate gender classification, contrastive loss and triplet loss based pre-training of the embedding layers and also joint training of the embedding layers along with a same/different classifier. Training is performed on 2-second single speaker segments based on ground truth speaker segmentation of broadcast news data. However, during test, we use the detection system in a practical low-latency setting for annotating automatic closed captions. In contrast to training, test pairs are now created around automatic speech recognition (ASR) based segmentation boundaries. The ASR segments are often shorter than 2 seconds causing duration mismatch during testing. In our experiments, although the baseline i-vector based classifier performs well, the proposed triplet loss based pre-training followed by joint training provides 7-50% relative F-measure improvement in matched and mismatched conditions. In addition, the degradation in performance is less severe for network based embeddings as compared to using i-vectors in the variable duration test conditions.

机译：在这项工作中，我们研究了基于神经网络的说话人嵌入的预训练，以实现低延迟的说话人变化检测。我们提出的系统采用两个语音段，使用共享的暹罗语层生成嵌入，然后根据连接的嵌入是否由同一说话者说出来进行分类。我们研究了基于性别分类，对比损失和三重损失的嵌入层预训练，以及与相同/不同分类器一起的嵌入层联合训练。基于广播新闻数据的地面真实讲话者分段，在2秒的单个讲话者分段上进行训练。但是，在测试过程中，我们在实用的低延迟设置中使用检测系统来注释自动隐藏字幕。与训练相反，现在围绕基于自动语音识别（ASR）的分段边界创建测试对。 ASR段通常短于2秒，从而导致测试期间的持续时间不匹配。在我们的实验中，尽管基于基线i-vector的分类器表现良好，但在匹配和不匹配条件下，基于三重态损失的预训练以及联合训练提供了7-50％的相对F值改进。另外，与在可变持续时间测试条件下使用i-vector相比，基于网络的嵌入的性能下降不那么严重。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|6286-6290|共5页
会议地点
作者
Leda Sarı; Samuel Thomas; Mark Hasegawa-Johnson; Michael Picheny;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
feature extraction; gender issues; information resources; learning (artificial intelligence); neural nets; signal classification; speaker recognition; speech processing;

机译：特征提取;性别问题;信息资源;学习（人工智能）;神经网络;信号分类;说话人识别;语音处理;

相似文献

外文文献
中文文献
专利

1. Hybridization DE with K-means for speaker clustering in speaker diarization of broadcasts news [J] . Dabbabi Karim, Hajji Salah, Cherif Adnen International journal of speech technology . 2019,第4期

机译：与K-means的混合DE用于演讲者广播新闻的演讲者聚类
2. Development of a Speaker Diarization System for Speaker Tracking in Audio Broadcast News: a Case Study [J] . Mihelic France, Vesnicer Bostjan, Zibert Janez Journal of computing and information technology . 2008,第3期

机译：音频广播新闻中演讲者跟踪的演讲者区分系统的开发：一个案例研究
3. Development Of A Speaker Diarization System For Speaker Tracking In Audio Broadcast News: A Case Study [J] . Janez Zibert, Bostjan Vesnicer, France Mihelic Journal of Computing and Information Technology . 2008,第3期

机译：音频广播新闻中演讲者跟踪的演讲者差异化系统的开发：一个案例研究
4. Pre-training of Speaker Embeddings for Low-latency Speaker Change Detection in Broadcast News [C] . Leda Sari, Samuel Thomas, Mark Hasegawa-Johnson, IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：广播新闻中低延迟扬声器更改检测的扬声器嵌入式预培训
5. How local television news has changed over the past thirty years: An examination of the influences of money, technology and personnel choices in the daily broadcast of local TV news. [D] . Conti, Paul. 2005

机译：在过去的30年中，本地电视新闻发生了怎样的变化：考察金钱，技术和人员选择在本地电视新闻的每日广播中的影响。
6. Context‐dependent role of selective attention for change detection in multi‐speaker scenes [O] . Christian Starzynski, Alexander Gutschalk 2018

机译：选择性关注在多说话者场景中检测变化时取决于上下文的作用
7. Speaker Change Detection and Speaker Clustering Using VQ Distortion for Broadcast News Speech Recognition [O] . Kazumasa Mori, Seiichi Nakagawa 2001

机译：利用VQ失真进行广播新闻语音识别的扬声器变化检测和扬声器聚类

Pre-training of Speaker Embeddings for Low-latency Speaker Change Detection in Broadcast News

摘要

著录项

相似文献

相关主题

期刊订阅