Probabilistic Speaker Diarization With Bag-of-Words Representations of Speaker Angle Information

Ishiguro K.; Yamada T.; Araki S.; Nakatani T.; Sawada H.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Probabilistic Speaker Diarization With Bag-of-Words Representations of Speaker Angle Information

【24h】

Probabilistic Speaker Diarization With Bag-of-Words Representations of Speaker Angle Information

机译：说话者角度信息的词袋表示概率的说话人区分

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Speaker diarization determines “who spoke when” from the recorded conversations of an unknown number of people. In general, we have no a priori information about the number, the locations, or even the characteristics of the speakers. Additionally, speakers'' speech utterances vary dynamically because of turn-taking during the conversations. These conditions make the speaker-clustering task extremely difficult. The problem becomes even harder if online (incremental) processing is required. In this paper, we formulate the speaker-clustering problem as the clustering of the sequential audio features generated by an unknown number of latent mixture components (speakers). We employ a probabilistic model that assumes time-sensitive speaker mixtures at every time frame, which, surprisingly, suits the diarization scenario. We combine the time-varying probabilistic model with direction of arrival (DOA) information calculated from a microphone array in a bag-of-words (BoW)-style feature representation. The proposed system effectively estimates the number and locations of the speakers in an online manner based on the standard Bayes inference scheme. Experiments confirm that the proposed model can successfully infer the number and features of speakers and yield better or comparable speaker diarization results compared with conventional methods in several datasets.

机译：说话者二分法可以根据记录的未知人数的对话确定“谁在何时说话”。通常，我们没有关于发言人的人数，位置甚至特征的先验信息。此外，由于对话过程中的转弯，说话者的语音发声会动态变化。这些条件使扬声器群集任务极为困难。如果需要在线（增量）处理，问题将变得更加棘手。在本文中，我们将扬声器群集问题描述为由未知数量的潜在混合分量（扬声器）生成的顺序音频特征的群集。我们采用了一个概率模型，该模型假设在每个时间范围内对时间敏感的说话者混合，这令人惊讶地适合于二值化方案。我们将时变概率模型与从麦克风阵列以词袋（BoW）样式的特征表示计算得出的到达方向（DOA）信息相结合。所提出的系统基于标准贝叶斯推理方案以在线方式有效地估计了说话者的数量和位置。实验证实，与多个方法中的常规方法相比，该模型可以成功推断出说话人的数量和特征，并产生更好或相当的说话人二分结果。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2012年第2期|p.447-460|共14页
作者
Ishiguro K.; Yamada T.; Araki S.; Nakatani T.; Sawada H.;
展开▼
作者单位

NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Bag-of-words (BOW); clustering; direction of arrival (DOA); latent Dirichlet allocation (LDA); microphone arrays; speaker diarization; variational Bayes inference;

机译：词袋（BOW）;聚类;到达方向（DOA）;潜在的狄利克雷分配（LDA）;麦克风阵列;扬声器二分化;可变贝叶斯推断;

相似文献

外文文献
中文文献
专利

1. Improved i-Vector Representation for Speaker Diarization [J] . Xu Yan, McLoughlin Ian, Song Yan, Circuits, systems, and signal processing . 2016,第9期

机译：改进的i-Vector表示以实现说话人区分
2. Survey Of Privacy-Preserving Audio Representations With Speaker Diarization [J] . S.Sathyapriya M.phil, A.Indhumathi International Journal of Computer Trends and Technology . 2013,第9期

机译：演讲者区分的隐私保护音频表示调查
3. Wordless Sounds: Robust Speaker Diarization Using Privacy-Preserving Audio Representations [J] . Parthasarathi S. H. K., Bourlard H., Gatica-Perez D. Audio, Speech, and Language Processing, IEEE Transactions on . 2013,第1期

机译：无言的声音：使用保护隐私的音频表示实现鲁棒的扬声器分离
4. Maximum-Likelihood Online Speaker Diarization in Noisy Meetings Based on Categorical Mixture Model and Probabilistic Spatial Dictionary [C] . Nobutaka Ito, Takashi Makino, Shoko Araki, IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：基于分类混合模型和概率空间字典的最高可能性在线扬声器日复速度在嘈杂的会议中和概率空间字典
5. Automatic Speaker Recognition and Diarization in Co-Channel Speech [D] . Shokouhi, Navid. 2017

机译：同频道语音中的说话人自动识别和区分
6. Supervised Speaker Diarization Using Random Forests: A Tool for Psychotherapy Process Research [O] . Lukas Fürer, Nathalie Schenk, Volker Roth, 2020

机译：使用随机森林监督扬声器日期：一种心理治疗过程研究的工具
7. Probabilistic Embeddings for Speaker Diarization [O] . Anna Silnova, Niko Brummer, Johan Rohdin, 2020

机译：扬声器日益改估的概率嵌入
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Probabilistic Speaker Diarization With Bag-of-Words Representations of Speaker Angle Information

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅