Speaker Diarisation Using 2D Self-attentive Combination of Embeddings

机译：使用嵌入的2D自关注组合进行说话人辩护

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speaker diarisation systems often cluster audio segments using speaker embeddings such as i-vectors and d-vectors. Since different types of embeddings are often complementary, this paper proposes a generic framework to improve performance by combining them into a single embedding, referred to as a c-vector. This combination uses a 2-dimensional (2D) self-attentive structure, which extends the standard self-attentive layer by averaging not only across time but also across different types of embeddings. Two types of 2D self-attentive structure studied in this paper are simultaneous combination and consecutive combination, which adopt single and multiple self-attentive layers respectively. The penalty term in the original self-attentive layer, which is jointly minimised with the objective function to encourage diversity of annotation vectors, is also modified to obtain not only different local peaks but also the overall trends in the multiple annotation vectors. Experiments on the AMI meeting corpus show that our modified penalty term improves the d-vector relative speaker error rate (SER) by 6% and 21% for d-vector systems, and a 10% further relative SER reduction can be obtained using the c-vector from our best 2D self-attentive structure.

机译：说话者差异化系统通常使用诸如i矢量和d矢量之类的说话者嵌入对音频片段进行聚类。由于不同类型的嵌入通常是互补的，因此本文提出了一个通用框架，通过将它们组合成单个嵌入（称为c向量）来提高性能。这种组合使用二维（2D）自关注结构，该结构不仅通过跨时间平均，而且还跨不同类型的嵌入进行平均，从而扩展了标准自关注层。本文研究的两种二维自我关注结构为同时组合和连续组合，分别采用单个和多个自我关注层。原始自关注层中的惩罚项与目标函数共同最小化以促进注释向量的多样性，因此也对其进行了修改，不仅获得了不同的局部峰值，而且还获得了多个注释向量的总体趋势。在AMI会议语料库上进行的实验表明，对于d矢量系统，我们的修正惩罚项将d矢量相对说话人错误率（SER）提高了6％和21％，并且使用c -矢量来自我们最好的2D自我关注结构。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2019年|5801-5805|共5页
会议地点
作者
G. Sun; C. Zhang; P. C. Woodland;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
speaker recognition; speech processing;

机译：说话人识别;语音处理;

相似文献

外文文献
中文文献
专利

1. Speaker overlap detection with prosodic features for speaker diarisation [J] . Zelenak M., Hernando J. Signal Processing, IET . 2012,第8期

机译：具有韵律特征的说话人重叠检测，可实现说话人区分
2. SAGRNN: Self-Attentive Gated RNN For Binaural Speaker Separation With Interaural Cue Preservation [J] . Ke Tan, Buye Xu, Anurag Kumar, IEEE signal processing letters . 2021,第1期

机译：SAGRNN：用于双耳扬声器分离的自闭症门控RNN，具有腔内提示保存
3. State-of-the-art speaker recognition with neural network embeddings in NIST SRE18 and Speakers in the Wild evaluations [J] . Jesus Villalba, Nanxin Chen, David Snyder, Computer speech and language . 2020,第Mara期

机译：NIST SRE18中具有神经网络嵌入功能的最先进的说话人识别功能，Wild评估中的说话人功能
4. Speaker Diarisation Using 2D Self-attentive Combination of Embeddings [C] . G. Sun, C. Zhang, P. C. Woodland IEEE International Conference on Acoustics, Speech and Signal Processing . 2019

机译：使用2D自闭症组合的eMbeddings的扬声器日期
5. Revealing Structure-Property Correlations in 2D Layered Materials Using Synergistic Combination of Electron Microscopy and Atomic-Scale Calculations. [D] . Lin, Junhao. 2015

机译：使用电子显微镜和原子尺度计算的协同组合揭示2D层状材料中的结构特性相关性。
6. Adversarially Learned Total Variability Embedding for Speaker Recognition with Random Digit Strings [O] . Woo Hyun Kang, Nam Soo Kim 2019

机译：对抗性学习的总可变性嵌入用于随机数字字符串的说话人识别
7. Combination of deep speaker embeddings for diarisation [O] . Guangzhi Sun, Chao Zhang, Philip C. Woodland 2021

机译：深度扬声器嵌入的组合估算

Speaker Diarisation Using 2D Self-attentive Combination of Embeddings

摘要

著录项

相似文献

相关主题

期刊订阅