首页> 外文会议>IEEE International Symposium on Multimedia >Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments
【24h】

Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments

机译:在重叠的语音段存在下的多模式扬声器分割

获取原文

摘要

We propose a multimodal speaker segmentation algorithm with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of the three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel method for dealing with overlapped speech segments through a likelihood model of the microphone array observations that uses multiple local maxima of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function in the Joint Probabilistic Data Association (JPDA) framework. Results show that the proposed method outperforms standard speaker segmentation systems based on: (a) speaker identification and; (b) microphone array processing, for datasets with the significant portion (27.4%) of overlapped speech, and scores as high as 94.4% on the F-measure scale.
机译:我们提出了一个多模式扬声器分段算法,具有两个主要贡献:首先,我们建议一个隐藏的马尔可夫模型架构,该架构执行三种模式的融合:一个用于参与者定位的多相机系统,扬声器定位的麦克风阵列和扬声器识别系统;其次,我们介绍了一种通过麦克风阵列观测的似然模型处理重叠的语音段的新方法,其使用在联合概率数据中使用多个局部最大化函数的多个局部最大化的函数的多个局部最大化协会(JPDA)框架。结果表明,该方法基于:(a)扬声器识别和扬声器识别和; (b)麦克风阵列处理,对于具有重叠语音的重要部分(27.4%)的数据集,并且在F测量标度上得分高达94.4%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号