Adaptive speaker diarization of broadcast news based on factor analysis

Desplanques Brecht; Demuynck Kris; Martens Jean Pierre

首页> 外文期刊>Computer speech and language >Adaptive speaker diarization of broadcast news based on factor analysis

【24h】

Adaptive speaker diarization of broadcast news based on factor analysis

机译：基于因子分析的广播新闻自适应说话人二元化

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The introduction of factor analysis techniques in a speaker diarization system enhances its performance by facilitating the use of speaker specific information, by improving the suppression of nuisance factors such as phonetic content, and by facilitating various forms of adaptation. This paper describes a state-of-the-art iVector-based diarization system which employs factor analysis and adaptation on all levels. The diarization modules relevant for this work are: the speaker segmentation which searches for speaker boundaries and the speaker clustering which aims at grouping speech segments of the same speaker. The speaker segmentation relies on speaker factors which are extracted on a frame-by-frame basis using eigenvoices. We incorporate soft voice activity detection in this extraction process as the speaker change detection should be based on speaker information only and we want it to disregard the non-speech frames by applying speech posteriors. Potential speaker boundaries are inserted at positions where rapid changes in speaker factors are witnessed. By employing Mahalanobis distances, the effect of the phonetic content can be further reduced, which results in more accurate speaker boundaries. This iVector-based segmentation significantly outperforms more common segmentation methods based on the Bayesian Information Criterion (BIC) or speech activity marks. The speaker clustering employs two-step Agglomerative Hierarchical Clustering (AHC): after initial BIC clustering, the second cluster stage is realized by either an iVector Probabilistic Linear Discriminant Analysis (PLDA) system or Cosine Distance Scoring (CDS) of extracted speaker factors. The segmentation system is made adaptive on a file-by-file basis by iterating the diarization process using eigenvoice matrices adapted (unsupervised) on the output of the previous iteration. Assuming that for most use cases material similar to the recording in question is readily available, unsupervised domain adaptation of the speaker clustering is possible as well. We obtain this by expanding the eigenvoice matrix used during speaker factor extraction for the CDS clustering stage with a small set of new eigenvoices that, in combination with the initial generic eigenvoices, models the recurring speakers and acoustic conditions more accurately. Experiments on the COST278 multilingual broadcast news database show the generation of significantly more accurate speaker boundaries by using adaptive speaker segmentation which also results in more accurate clustering. The obtained speaker error rate (SER) can be further reduced by another 13% relative to 7.4% via domain adaptation of the CDS clustering.

机译：说话者差异化系统中因素分析技术的引入通过促进说话者特定信息的使用，改善对诸如语音内容之类的有害因素的抑制以及促进各种形式的适应来增强其性能。本文介绍了基于iVector的最先进的二值化系统，该系统在所有级别上都进行了因子分析和自适应。与这项工作相关的差异化模块是：说话人分割，用于搜索说话人边界；说话人聚类，其目的是对同一说话人的语音片段进行分组。说话人分割依赖于说话人因素，该说话人因素是使用特征语音在逐帧的基础上提取的。我们将软语音活动检测合并到此提取过程中，因为说话者变化检测应该仅基于说话者信息，并且我们希望它通过应用语音后代来忽略非语音帧。潜在的说话者边界插入到可以看到说话者因素迅速变化的位置。通过使用马哈拉诺比斯距离，可以进一步减小语音内容的影响，从而使扬声器边界更加准确。这种基于iVector的细分明显优于基于贝叶斯信息准则（BIC）或语音活动标记的更常见的细分方法。说话人聚类采用两步聚合层次聚类（AHC）：在初始BIC聚类之后，第二个聚类阶段是通过iVector概率线性判别分析（PLDA）系统或提取的说话人因素的余弦距离评分（CDS）实现的。通过使用在先前迭代的输出上调整（无监督）的本征语音矩阵来迭代扩展过程，可以使分割系统在逐个文件的基础上自适应。假设对于大多数使用情况而言，类似于所讨论的录音的材料很容易获得，那么说话人群集的无监督域适配也是可能的。通过使用少量新特征集扩展CDS聚类阶段的说话者因子提取过程中使用的特征语音矩阵，可以得到这一点，该特征集与初始的通用特征语音结合，可以更准确地对重复出现的说话者和声学条件进行建模。在COST278多语言广播新闻数据库上进行的实验表明，通过使用自适应说话人细分可以显着提高说话人边界，这也可以使聚类更加准确。通过CDS群集的域自适应，可以将获得的说话人错误率（SER）相对于7.4％进一步降低13％。

著录项

来源
《Computer speech and language》 |2017年第11期|72-93|共22页
作者
Desplanques Brecht; Demuynck Kris; Martens Jean Pierre;
展开▼
作者单位

Department of Electronics and Information Systems, Ghent University - imec, Sint-Pietersnieuwstraat 41, IDLab, Ghent, Belgium;

Department of Electronics and Information Systems, Ghent University - imec, Sint-Pietersnieuwstraat 41, IDLab, Ghent, Belgium;

Department of Electronics and Information Systems, Ghent University - imec, Sint-Pietersnieuwstraat 41, IDLab, Ghent, Belgium;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Domain adaptation; Factor analysis; iVector; Speaker diarization; Speaker segmentation;

机译：领域适应;因子分析;iVector;说话人差异化;说话人细分;

相似文献

外文文献
中文文献
专利

1. Hybridization DE with K-means for speaker clustering in speaker diarization of broadcasts news [J] . Dabbabi Karim, Hajji Salah, Cherif Adnen International journal of speech technology . 2019,第4期

机译：与K-means的混合DE用于演讲者广播新闻的演讲者聚类
2. Development of a Speaker Diarization System for Speaker Tracking in Audio Broadcast News: a Case Study [J] . Mihelic France, Vesnicer Bostjan, Zibert Janez Journal of computing and information technology . 2008,第3期

机译：音频广播新闻中演讲者跟踪的演讲者区分系统的开发：一个案例研究
3. Development Of A Speaker Diarization System For Speaker Tracking In Audio Broadcast News: A Case Study [J] . Janez Zibert, Bostjan Vesnicer, France Mihelic Journal of Computing and Information Technology . 2008,第3期

机译：音频广播新闻中演讲者跟踪的演讲者差异化系统的开发：一个案例研究
4. Speaker Diarization in Broadcast News Using Sub Glottal Resonances [C] . Homa Afaghi Kadijani, Farbod Razzazi Iranian Conference on Signal Processing and Intelligent Systems . 2019

机译：使用子声门共振的广播新闻中的说话人二分法
5. Speaker adaptation in joint factor analysis based text independent speaker verification [D] . Shou-Chun, Yin 2007

机译：基于联合因素分析的文本自适应说话人验证中的说话人适应
6. Supervised Speaker Diarization Using Random Forests: A Tool for Psychotherapy Process Research [O] . Lukas Fürer, Nathalie Schenk, Volker Roth, 2020

机译：使用随机森林监督扬声器日期：一种心理治疗过程研究的工具
7. Adaptive speaker diarization of broadcast news based on factor analysis [O] . Desplanques Brecht, Demuynck Kris, Martens Jean-Pierre 2017

机译：基于因子分析的广播新闻自适应说话人智能化
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Adaptive speaker diarization of broadcast news based on factor analysis

摘要

著录项

相似文献

相关主题

期刊订阅