Using Class Purity as Criterion for Speaker Clustering in Multi-Speaker Detection Tasks

机译：在多说话者检测任务中使用类别纯度作为说话者聚类的标准

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Speaker clustering is an important step in multi-speaker detection tasks and its performance directly affects the speaker detection performance. It is observed that the shorter the average length of single-speaker speech segments after segmentation is, the worse performance of the following speaker recognition will be achieved, therefore a reasonable solution to better multi-speaker detection performance is to enlarge the average length of after-segmentation single-speaker speech segments, which is equivalently to cluster as many true same-speaker segments into one as possible. In other words, the average class purity of each speaker segment should be as bigger as possible. Accordingly, a speaker-clustering algorithm based on the class purity criterion is proposed, where a Reference Speaker Model (RSM) scheme is adopted to calculate the distance between speech segments, and the maximal class purity, or equivalently the minimal within-class dispersion, is taken as the criterion. Experiments on the NIST SRE 2006 database showed that, compared with the conventional Hierarchical Agglomerative Clustering (HAC) algorithm, for speech segments with average lengths of 2 seconds, 5 seconds and 8 seconds, the proposed algorithm increased the valid class speech length by 2.7%, 3.8% and 4.6%, respectively, and finally the target speaker detection recall rate was increased by 7.6%, 6.2% and 5.1%, respectively.

机译：扬声器聚类是多扬声器检测任务的重要步骤，其性能直接影响扬声器检测性能。观察到，分割后的单扬声器语音段的平均长度越短，将实现以下扬声器识别的更糟糕的性能，因此合理的解决方案更好的多扬声器检测性能是放大后的平均长度 - 单扬声器语音段，其等效地簇成为尽可能多的真实扬声器段。换句话说，每个扬声器段的平均纯度应尽可能大。因此，提出了一种基于类纯度标准的扬声器聚类算法，其中采用参考扬声器模型（RSM）方案来计算语音段之间的距离和最大类纯度，或等效地在级别的色散内，被视为标准。 NIST 2006数据库的实验表明，与传统的分层凝聚聚类（HAC）算法相比，对于平均长度为2秒，5秒和8秒的语音段，所提出的算法将有效类语音长度增加2.7％分别为3.8％和4.6％，最后，目标扬声器检测召回率分别增加了7.6％，6.2％和5.1％。

著录项

来源
《Asia-Pacific Signal and Information Processing Association Annual Summit and Conference》|2011年|1-4|共4页
会议地点
作者
Gang Wang; Xiaojun Wu; Thomas Fang Zheng; Linlin Wang; Chenhao Zhang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算机的应用;信号处理;
关键词

相似文献

外文文献
中文文献
专利

1. Fuzzy Phoneme Classification Using Multi-speaker Vocal Tract Length Normalization [J] . Jensen Wong Jing Lung, Sah Hj. Salam, Amjad Rehman, IETE Technical Review . 2014,第2期

机译：多说话人语音长度归一化的模糊音素分类
2. An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis [J] . Beáta L?rincz, Adriana Stan, Mircea Giurgiu Procedia Computer Science . 2021,第a期

机译：对多扬声器深神经动词合成中记录条件和扬声器特性的客观评价
3. A Robust Spectral Correlation Technique for Text Dependent Speaker Identification under Co-Channel Multi-Speaker Conditions [J] . Aya S. Mostafa, Amr M. Gody, Tamer M. Barakat International Journal of Engineering Trends and Technology . 2016,第5期

机译：共通道多说话者条件下基于文本的说话人识别的鲁棒频谱相关技术
4. Using Class Purity as Criterion for Speaker Clustering in Multi-Speaker Detection Tasks [C] . Gang Wang, Xiaojun Wu, Thomas Fang Zheng, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference . 2011

机译：使用类纯度作为多扬声器检测任务中扬声器聚类的标准
5. The Online Adjustment of Speaker-Specific Phonetic Beliefs in Multi-Speaker Speech Perception [D] . Lai, Wei. 2021

机译：在多扬声器语音感知中的发言者特定语音信念的在线调整
6. The Dynamics of Attention Shifts Among Concurrent Speech in a Naturalistic Multi-speaker Virtual Environment [O] . Keren Shavit-Cohen, Elana Zion Golumbic 2019

机译：自然多说话者虚拟环境中并发语音中注意转移的动力学
7. Modeling both Context- and Speaker-Sensitive Dependence for Emotion Detection in Multi-speaker Conversations [O] . Dong Zhang, Liangqing Wu, Changlong Sun, 2019

机译：在多扬声器对话中建模对情绪检测的情境和扬声器敏感依赖性
8. A Limited-Vocabulary, Multi-Speaker Automatic Isolated Word Recognition System [R] . Paul, J. E. 1969

机译：有限词汇，多扬声器自动孤立词识别系统

Using Class Purity as Criterion for Speaker Clustering in Multi-Speaker Detection Tasks

摘要

著录项

相似文献

相关主题

期刊订阅