首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Probabilistic Speaker Diarization With Bag-of-Words Representations of Speaker Angle Information
【24h】

Probabilistic Speaker Diarization With Bag-of-Words Representations of Speaker Angle Information

机译:说话者角度信息的词袋表示概率的说话人区分

获取原文
获取原文并翻译 | 示例

摘要

Speaker diarization determines “who spoke when” from the recorded conversations of an unknown number of people. In general, we have no a priori information about the number, the locations, or even the characteristics of the speakers. Additionally, speakers'' speech utterances vary dynamically because of turn-taking during the conversations. These conditions make the speaker-clustering task extremely difficult. The problem becomes even harder if online (incremental) processing is required. In this paper, we formulate the speaker-clustering problem as the clustering of the sequential audio features generated by an unknown number of latent mixture components (speakers). We employ a probabilistic model that assumes time-sensitive speaker mixtures at every time frame, which, surprisingly, suits the diarization scenario. We combine the time-varying probabilistic model with direction of arrival (DOA) information calculated from a microphone array in a bag-of-words (BoW)-style feature representation. The proposed system effectively estimates the number and locations of the speakers in an online manner based on the standard Bayes inference scheme. Experiments confirm that the proposed model can successfully infer the number and features of speakers and yield better or comparable speaker diarization results compared with conventional methods in several datasets.
机译:说话者二分法可以根据记录的未知人数的对话确定“谁在何时说话”。通常,我们没有关于发言人的人数,位置甚至特征的先验信息。此外,由于对话过程中的转弯,说话者的语音发声会动态变化。这些条件使扬声器群集任务极为困难。如果需要在线(增量)处理,问题将变得更加棘手。在本文中,我们将扬声器群集问题描述为由未知数量的潜在混合分量(扬声器)生成的顺序音频特征的群集。我们采用了一个概率模型,该模型假设在每个时间范围内对时间敏感的说话者混合,这令人惊讶地适合于二值化方案。我们将时变概率模型与从麦克风阵列以词袋(BoW)样式的特征表示计算得出的到达方向(DOA)信息相结合。所提出的系统基于标准贝叶斯推理方案以在线方式有效地估计了说话者的数量和位置。实验证实,与多个方法中的常规方法相比,该模型可以成功推断出说话人的数量和特征,并产生更好或相当的说话人二分结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号