首页> 外文期刊>Computer speech and language >Generalized Viterbi-based models for time-series segmentation and clustering applied to speaker diarization
【24h】

Generalized Viterbi-based models for time-series segmentation and clustering applied to speaker diarization

机译:基于通用维特比的时间序列分割和聚类模型,用于说话人区分

获取原文
获取原文并翻译 | 示例

摘要

Speaker diarization is a problem of separating unknown speakers in a conversation into homogeneous parts in the speaker sense. State-of-the-art diarization systems are based on i-vector methodologies. However, these approaches require large quantities of training data, which must be obtained from an environment that is similar to that of the conversation being diarized. In this paper we present a diarization system that does not require such training data but instead can suffice with some development data for parameter-tuning. This system is a generalization of the well-known hidden Markov model (HMM), a popular clustering algorithm trained by Viterbi statistics. Our proposed model, referred to as a hidden distortion model (HDM), is based on state distortion models and transition costs, for which probabilistic calculations are not mandatory, in contrast to the case of HMM. We provide a mathematical basis for our approach, and we demonstrate that Viterbi-based HMM can be seen as a special case of HDM. This proximity allows us to apply similar approaches for state-model training when the new paradigm is used to learn sequence dependencies. We carry out diarizations of two-speaker telephone conversations in order to evaluate the performance of HDM. When applied to conversations from the LDC CALLHOME database, HDM improves on the performance of a baseline HMM system by about 26% (relative improvement). Moreover, when applied to the NIST 2005 database, it yields a small improvement over the HMM system.
机译:说话人歧义化是将谈话中的未知说话人分成说话人意义上相同的部分的问题。最先进的数字化系统基于i-vector方法。但是,这些方法需要大量的训练数据,这些数据必须从与正在对话的会话相似的环境中获取。在本文中,我们提出了一个不需要该训练数据但可以满足一些开发数据以进行参数调整的差分系统。该系统是众所周知的隐马尔可夫模型(HMM)的推广,该模型是由Viterbi统计训练的流行聚类算法。与HMM相比,我们提出的模型被称为隐藏失真模型(HDM),它基于状态失真模型和过渡成本,对于这些模型,强制性计算不是强制性的。我们为该方法提供了数学基础,并且证明基于维特比的HMM可被视为HDM的特例。当新的范式用于学习序列依赖性时,这种接近允许我们将类似的方法用于状态模型训练。为了评估HDM的性能,我们对两个扬声器的电话进行了对话。当应用于来自LDC CALLHOME数据库的对话时,HDM将基准HMM系统的性能提高了约26%(相对改进)。此外,当将其应用于NIST 2005数据库时,它对HMM系统的改进很小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号