A study of speaker clustering for speaker attribution in large telephone conversation datasets

Houman Ghaemmaghami; David Dean; Sridha Sridharan; David A. van Leeuwen

首页> 外文期刊>Computer speech and language >A study of speaker clustering for speaker attribution in large telephone conversation datasets

【24h】

A study of speaker clustering for speaker attribution in large telephone conversation datasets

机译：大型电话会话数据集中说话人归因的说话人聚类研究

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper proposes the task of speaker attribution as speaker diarization followed by speaker linking. The aim of attribution is to identify and label common speakers across multiple recordings. To do this, it is necessary to first carry out diarization to obtain speaker-homogeneous segments from each recording. Speaker linking can then be conducted to link common speaker identities across multiple inter-session recordings. This process can be extremely inefficient using the traditional agglomerative cluster merging and retraining commonly employed in diarization. We thus propose an attribution system using complete-linkage clustering (CLC) without model retraining. We show that on top of the efficiency gained through elimination of the retraining phase, greater accuracy is achieved by utilizing the farthest-neighbor criterion inherent to CLC for both diarization and linking. We first evaluate the use of CLC against an agglomerative clustering (AC) without retraining approach, traditional agglomerative clustering with retraining (ACR) and single-linkage clustering (SLC) for speaker linking. We show that CLC provides a relative improvement of 20%, 29% and 39% in attribution error rate (AER) over the three said approaches, respectively. We then propose a diarization system using CLC and show that it outperforms AC, ACR and SLC with relative improvements of 32%, 50% and 70% in diarization error rate (DER), respectively. In our work, we employ the cross-likelihood ratio (CLR) as the model comparison metric for clustering and investigate its robustness as a stopping criterion for attribution.

机译：本文提出了说话人归因的任务，即说话人区分和说话人链接。归因的目的是在多个录音中识别并标记普通讲话者。为此，必须首先进行数字化处理，以从每个记录中获得说话人同质的片段。然后可以进行演讲者链接，以跨多个会话间记录链接常见的演讲者身份。使用传统的聚集集群合并和再训练通常会在效率上效率低下，而传统的聚集集群通常是在合并过程中使用的。因此，我们提出了一种使用完全链接聚类（CLC）而不进行模型重新训练的归因系统。我们表明，除了通过消除再培训阶段而获得的效率之外，通过利用CLC固有的最远邻居准则进行扩展和链接，可以实现更高的准确性。我们首先评估针对不使用重新训练方法的团聚集群（AC），使用重新训练的传统团聚集群（ACR）和针对说话者链接的单链接集群（SLC）的CLC的使用。我们表明，与上述三种方法相比，CLC分别将归因错误率（AER）分别提高了20％，29％和39％。然后，我们提出了一种使用CLC的数字化系统，并表明它优于AC，ACR和SLC，其数字化错误率（DER）分别相对提高了32％，50％和70％。在我们的工作中，我们采用交叉似然比（CLR）作为模型比较指标进行聚类，并研究其稳健性作为归因的停止标准。

著录项

来源
《Computer speech and language》 |2016年第11期|23-45|共23页
作者
Houman Ghaemmaghami; David Dean; Sridha Sridharan; David A. van Leeuwen;
展开▼
作者单位

Speech and Audio Research Laboratory, Queensland University of Technology, Brisbane, Australia;

Speech and Audio Research Laboratory, Queensland University of Technology, Brisbane, Australia;

Speech and Audio Research Laboratory, Queensland University of Technology, Brisbane, Australia;

Center for Language and Speech Technology, Radboud University Nijmegen, Netherlands;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Speaker attribution; Linking; Diarization; Complete-linkage clustering; Joint factor analysis; Cross-likelihood ratio;

机译：演讲者归因;链接;差异化;全链接聚类;联合因素分析;交叉可能性比;

相似文献

外文文献
中文文献
专利

1. Speaker clustering using telephone speech database of a large number of speakers [J] . Tsuneo Kato, Shingo Kuroiwa, Tohru Shimizu, 電子情報通信学会技術研究報告. 音声. Speech . 2000,第136期

机译：使用大量演讲者的电话语音数据库进行演讲者聚类
2. Speaker clustering using telephone speech database of a large number of speakers [J] . Tsuneo Kato, Shingo Kuroiwa, Tohru Shimizu, 電子情報通信学会技術研究報告. 音声. Speech . 2000,第136期

机译：扬声器聚类使用大量扬声器的电话语音数据库
3. Initialization of Iterative-Based Speaker Diarization Systems for Telephone Conversations [J] . Ben-Harush O., Ben-Harush O., Lapidot I., Audio, Speech, and Language Processing, IEEE Transactions on . 2012,第2期

机译：电话会议基于迭代的说话人区分系统的初始化
4. Speaker attribution of multiple telephone conversations using a complete-linkage clustering approach [C] . Ghaemmaghami, Houman IEEE International Conference on Acoustics, Speech and Signal Processing;ICASSP . 2012

机译：使用完全链接聚类方法的多个电话对话的演讲者归因
5. Syntax of French conversation: A comparative study of two French speakers. [D] . Nivet, Marie-Christine. 1989

机译：法语对话的语法：对两个法语使用者的比较研究。
6. Native speakers like affixes L2 speakers like letters? An overt visual priming study investigating the role of orthography in L2 morphological processing [O] . Laura Anna Ciaccio, Gunnar Jacob 2019

机译：母语扬声器喜欢附件L2扬声器喜欢字母吗？一种明显的视觉灌注研究调查拼影在L2形态加工中的作用
7. A Speaker Rediarization Scheme for Improving Diarization in Large Two-Speaker Telephone Datasets [O] . Dean David, Ghaemmaghami Houman, Sridharan Sridha 2014

机译：改进大型两扬声器电话数据集中的扬声器重扬声器化方案

A study of speaker clustering for speaker attribution in large telephone conversation datasets

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅