首页> 外文期刊>Computer speech and language >Adaptive speaker diarization of broadcast news based on factor analysis
【24h】

Adaptive speaker diarization of broadcast news based on factor analysis

机译:基于因子分析的广播新闻自适应说话人二元化

获取原文
获取原文并翻译 | 示例
           

摘要

The introduction of factor analysis techniques in a speaker diarization system enhances its performance by facilitating the use of speaker specific information, by improving the suppression of nuisance factors such as phonetic content, and by facilitating various forms of adaptation. This paper describes a state-of-the-art iVector-based diarization system which employs factor analysis and adaptation on all levels. The diarization modules relevant for this work are: the speaker segmentation which searches for speaker boundaries and the speaker clustering which aims at grouping speech segments of the same speaker. The speaker segmentation relies on speaker factors which are extracted on a frame-by-frame basis using eigenvoices. We incorporate soft voice activity detection in this extraction process as the speaker change detection should be based on speaker information only and we want it to disregard the non-speech frames by applying speech posteriors. Potential speaker boundaries are inserted at positions where rapid changes in speaker factors are witnessed. By employing Mahalanobis distances, the effect of the phonetic content can be further reduced, which results in more accurate speaker boundaries. This iVector-based segmentation significantly outperforms more common segmentation methods based on the Bayesian Information Criterion (BIC) or speech activity marks. The speaker clustering employs two-step Agglomerative Hierarchical Clustering (AHC): after initial BIC clustering, the second cluster stage is realized by either an iVector Probabilistic Linear Discriminant Analysis (PLDA) system or Cosine Distance Scoring (CDS) of extracted speaker factors. The segmentation system is made adaptive on a file-by-file basis by iterating the diarization process using eigenvoice matrices adapted (unsupervised) on the output of the previous iteration. Assuming that for most use cases material similar to the recording in question is readily available, unsupervised domain adaptation of the speaker clustering is possible as well. We obtain this by expanding the eigenvoice matrix used during speaker factor extraction for the CDS clustering stage with a small set of new eigenvoices that, in combination with the initial generic eigenvoices, models the recurring speakers and acoustic conditions more accurately. Experiments on the COST278 multilingual broadcast news database show the generation of significantly more accurate speaker boundaries by using adaptive speaker segmentation which also results in more accurate clustering. The obtained speaker error rate (SER) can be further reduced by another 13% relative to 7.4% via domain adaptation of the CDS clustering.
机译:说话者差异化系统中因素分析技术的引入通过促进说话者特定信息的使用,改善对诸如语音内容之类的有害因素的抑制以及促进各种形式的适应来增强其性能。本文介绍了基于iVector的最先进的二值化系统,该系统在所有级别上都进行了因子分析和自适应。与这项工作相关的差异化模块是:说话人分割,用于搜索说话人边界;说话人聚类,其目的是对同一说话人的语音片段进行分组。说话人分割依赖于说话人因素,该说话人因素是使用特征语音在逐帧的基础上提取的。我们将软语音活动检测合并到此提取过程中,因为说话者变化检测应该仅基于说话者信息,并且我们希望它通过应用语音后代来忽略非语音帧。潜在的说话者边界插入到可以看到说话者因素迅速变化的位置。通过使用马哈拉诺比斯距离,可以进一步减小语音内容的影响,从而使扬声器边界更加准确。这种基于iVector的细分明显优于基于贝叶斯信息准则(BIC)或语音活动标记的更常见的细分方法。说话人聚类采用两步聚合层次聚类(AHC):在初始BIC聚类之后,第二个聚类阶段是通过iVector概率线性判别分析(PLDA)系统或提取的说话人因素的余弦距离评分(CDS)实现的。通过使用在先前迭代的输出上调整(无监督)的本征语音矩阵来迭代扩展过程,可以使分割系统在逐个文件的基础上自适应。假设对于大多数使用情况而言,类似于所讨论的录音的材料很容易获得,那么说话人群集的无监督域适配也是可能的。通过使用少量新特征集扩展CDS聚类阶段的说话者因子提取过程中使用的特征语音矩阵,可以得到这一点,该特征集与初始的通用特征语音结合,可以更准确地对重复出现的说话者和声学条件进行建模。在COST278多语言广播新闻数据库上进行的实验表明,通过使用自适应说话人细分可以显着提高说话人边界,这也可以使聚类更加准确。通过CDS群集的域自适应,可以将获得的说话人错误率(SER)相对于7.4%进一步降低13%。

著录项

  • 来源
    《Computer speech and language》 |2017年第11期|72-93|共22页
  • 作者单位

    Department of Electronics and Information Systems, Ghent University - imec, Sint-Pietersnieuwstraat 41, IDLab, Ghent, Belgium;

    Department of Electronics and Information Systems, Ghent University - imec, Sint-Pietersnieuwstraat 41, IDLab, Ghent, Belgium;

    Department of Electronics and Information Systems, Ghent University - imec, Sint-Pietersnieuwstraat 41, IDLab, Ghent, Belgium;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Domain adaptation; Factor analysis; iVector; Speaker diarization; Speaker segmentation;

    机译:领域适应;因子分析;iVector;说话人差异化;说话人细分;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号