首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Speaker Diarisation Using 2D Self-attentive Combination of Embeddings
【24h】

Speaker Diarisation Using 2D Self-attentive Combination of Embeddings

机译:使用嵌入的2D自关注组合进行说话人辩护

获取原文

摘要

Speaker diarisation systems often cluster audio segments using speaker embeddings such as i-vectors and d-vectors. Since different types of embeddings are often complementary, this paper proposes a generic framework to improve performance by combining them into a single embedding, referred to as a c-vector. This combination uses a 2-dimensional (2D) self-attentive structure, which extends the standard self-attentive layer by averaging not only across time but also across different types of embeddings. Two types of 2D self-attentive structure studied in this paper are simultaneous combination and consecutive combination, which adopt single and multiple self-attentive layers respectively. The penalty term in the original self-attentive layer, which is jointly minimised with the objective function to encourage diversity of annotation vectors, is also modified to obtain not only different local peaks but also the overall trends in the multiple annotation vectors. Experiments on the AMI meeting corpus show that our modified penalty term improves the d-vector relative speaker error rate (SER) by 6% and 21% for d-vector systems, and a 10% further relative SER reduction can be obtained using the c-vector from our best 2D self-attentive structure.
机译:说话者差异化系统通常使用诸如i矢量和d矢量之类的说话者嵌入对音频片段进行聚类。由于不同类型的嵌入通常是互补的,因此本文提出了一个通用框架,通过将它们组合成单个嵌入(称为c向量)来提高性能。这种组合使用二维(2D)自关注结构,该结构不仅通过跨时间平均,而且还跨不同类型的嵌入进行平均,从而扩展了标准自关注层。本文研究的两种二维自我关注结构为同时组合和连续组合,分别采用单个和多个自我关注层。原始自关注层中的惩罚项与目标函数共同最小化以促进注释向量的多样性,因此也对其进行了修改,不仅获得了不同的局部峰值,而且还获得了多个注释向量的总体趋势。在AMI会议语料库上进行的实验表明,对于d矢量系统,我们的修正惩罚项将d矢量相对说话人错误率(SER)提高了6%和21%,并且使用c -矢量来自我们最好的2D自我关注结构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号