首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Initialization of Iterative-Based Speaker Diarization Systems for Telephone Conversations
【24h】

Initialization of Iterative-Based Speaker Diarization Systems for Telephone Conversations

机译:电话会议基于迭代的说话人区分系统的初始化

获取原文
获取原文并翻译 | 示例

摘要

Speaker diarization systems attempt to assign temporal segments from a conversation between R speakers to an appropriate speaker r. This task is generally performed when no prior information is given regarding the speakers. The number of speakers is usually unknown and needs to be estimated. However, there are applications where the number of speakers is known in advance. The diarization process generally consists of change detection, clustering and labeling of a given audio stream. Speaker diarization can be performed using an iterative approach that is optimized by the selection of appropriate initial conditions. This study examines the influence of several common initialization algorithms including two variants of a recently proposed, K-means based initialization algorithm over the performance of an iterative-based speaker diarization system applied to two speaker telephone conversations. The suggested speaker diarization system employs either self organizing maps or Gaussian mixture models in order to model the speakers and non-speech in the conversation. The diarization system and initialization algorithms are tuned using 108 telephone conversations taken from LDC CallHome corpus, this is the development set. The evaluation subset is composed of 2048 telephone conversations extracted from the NIST 2005 Rich Transcription corpus. The results obtained show that by initializing the speaker diarization system using the K-means based algorithms provide a relative improvement of 10.4% for the LDC development set and 12.2% for the NIST evaluation subset when compared to random initialization after 12 iterations which are required for the convergence of the diarization process using random initialization. However, when using the K-means based initialization approach, only five iterations are required for the system to converge. Thus, using the new initialization allows us to improve the performances both in terms of diarization error rate and speed of co- vergence.
机译:说话者区分系统试图将R个说话者之间的对话的时间段分配给适当的说话者r。当没有给出关于说话者的事先信息时,通常执行该任务。发言人的人数通常是未知的,需要估计。但是,在有些应用中,事先知道扬声器的数量。区分过程通常由给定音频流的变化检测,聚类和标记组成。可以使用通过选择适当的初始条件而优化的迭代方法来执行说话人区分。这项研究研究了几种常见的初始化算法,包括最近提出的,基于K均值的初始化算法的两个变体,对应用于两个扬声器电话对话的基于迭代的扬声器差异化系统的性能的影响。建议的说话人区分系统采用自组织图或高斯混合模型,以便对对话中的说话人和非说话人建模。使用从LDC CallHome语料库获取的108个电话对话来调整数字化系统和初始化算法,这就是开发集。评估子集由从NIST 2005丰富转录语料库中提取的2048个电话交谈组成。获得的结果表明,与12次迭代所需的随机初始化相比,通过使用基于K-means的算法初始化说话者差异化系统,LDC开发集和NIST评估子集的相对改进分别为10.4%和12.2%。使用随机初始化的二值化过程的收敛性。但是,当使用基于K均值的初始化方法时,系统只需要进行五次迭代即可收敛。因此,使用新的初始化方法使我们能够提高双精度误差率和收敛速度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号