Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments

机译：在重叠的语音段存在下的多模式扬声器分割

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We propose a multimodal speaker segmentation algorithm with two main contributions: First, we suggest a hidden Markov model architecture that performs fusion of the three modalities: a multi-camera system for participant localization, a microphone array for speaker localization, and a speaker identification system; Second, we present a novel method for dealing with overlapped speech segments through a likelihood model of the microphone array observations that uses multiple local maxima of the Steered Power Response Generalized Cross Correlation Phase Transform (SPR-GCC-PHAT) function in the Joint Probabilistic Data Association (JPDA) framework. Results show that the proposed method outperforms standard speaker segmentation systems based on: (a) speaker identification and; (b) microphone array processing, for datasets with the significant portion (27.4%) of overlapped speech, and scores as high as 94.4% on the F-measure scale.

机译：我们提出了一个多模式扬声器分段算法，具有两个主要贡献：首先，我们建议一个隐藏的马尔可夫模型架构，该架构执行三种模式的融合：一个用于参与者定位的多相机系统，扬声器定位的麦克风阵列和扬声器识别系统;其次，我们介绍了一种通过麦克风阵列观测的似然模型处理重叠的语音段的新方法，其使用在联合概率数据中使用多个局部最大化函数的多个局部最大化的函数的多个局部最大化协会（JPDA）框架。结果表明，该方法基于：（a）扬声器识别和扬声器识别和; （b）麦克风阵列处理，对于具有重叠语音的重要部分（27.4％）的数据集，并且在F测量标度上得分高达94.4％。

著录项

来源
《IEEE International Symposium on Multimedia》|2008年||共6页
会议地点
作者
Rozgic Viktor; Han Kyu Jeong; Georgiou Panayiotis G.; Narayanan Shrikanth;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP37-53;
关键词
joint probabilistic data association; microphone array; multimodal fusion; speaker identification; speaker segmentation;

机译：联合概率数据关联;麦克风阵列;多模式融合;扬声器识别;扬声器分割;

相似文献

外文文献
中文文献
专利

1. A study of speaker segmentation of dialogue speech with speech overlapped section [J] . Masahumi Kobayashi, Tatsuya Kitamura, Shigeyoshi Kitazawa 電子情報通信学会技術研究報告. 音声. Speech . 2001,第522期

机译：语音重叠部分对话语音的说话人分割研究
2. A study of speaker segmentation of dialogue speech with speech overlapped section [J] . Masahumi Kobayashi, Tatsuya Kitamura, Shigeyoshi Kitazawa 電子情報通信学会技術研究報告. 音声. Speech . 2001,第522期

机译：言语重叠部分对话讲话的演讲分割研究
3. A study of speaker segmentation of dialogue speech with speech overlapped section [J] . Masahumi Kobayashi, Tatsuya Kitamura, Shigeyoshi Kitazawa 電子情報通信学会技術研究報告. 言語理解とコミュニケーション. Natural Language Understanding and Models of Communication . 2001,第520期

机译：言语重叠部分对话讲话的演讲分割研究
4. Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments [C] . Rozgic Viktor, Han Kyu Jeong, Georgiou Panayiotis G., IEEE International Symposium on Multimedia . 2008

机译：在重叠的语音段存在下的多模式扬声器分割
5. Robust speaker recognition in the presence of speech coding distortion [D] . Mudrowsky, Robert W. 2016

机译：在语音编码失真存在下强大的扬声器识别
6. Automatic initial and final segmentation in cleft palate speech of Mandarin speakers [O] . Ling He, Yin Liu, Heng Yin, 2011

机译：中文普通话c裂语音的自动初始和最终分割
7. Multimodal speaker segmentation in presence of overlapped speech segments [O] . Kyu Jeong Han, Panayiotis G. Georgiou, Shrikanth Narayanan 2014

机译：存在重叠语音段的多模态说话者分段
8. Part Ⅰ SEGMENTATION TECHNIQUES IS SPEECH 3YOTHBSIS Part Ⅱ A SEGMENT INVENTORY FOR SPEECH SYNTHESIS [R] . Gordon E. Peterson, William S-Y Wang 1958

机译：第一部分分段技术是语音合成第二部分语音合成的分段库存

Multimodal Speaker Segmentation in Presence of Overlapped Speech Segments

摘要

著录项

相似文献

相关主题

期刊订阅