Improving Short Utterance Speaker Recognition by Modeling Speech Unit Classes

Lantian Li; Dong Wang; Chenhao Zhang; Thomas Fang Zheng

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Improving Short Utterance Speaker Recognition by Modeling Speech Unit Classes

【24h】

Improving Short Utterance Speaker Recognition by Modeling Speech Unit Classes

机译：通过对语音单元类别进行建模来提高说话者的简短识别能力

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Short utterance speaker recognition (SUSR) is highly challenging due to the limited enrollment and/or test data. We argue that the difficulty can be largely attributed to the mismatched prior distributions of the speech data used to train the universal background model (UBM) and those for enrollment and test. This paper presents a novel solution that distributes speech signals into a multitude of acoustic subregions that are defined by speech units, and models speakers within the subregions. To avoid data sparsity, a data-driven approach is proposed to cluster speech units into speech unit classes, based on which robust subregion models can be constructed. Further more, we propose a model synthesis approach based on maximum likelihood linear regression (MLLR) to deal with no-data speech unit classes. The experiments were conducted on a publicly available database SUD12. The results demonstrated that on a text-independent speaker recognition task where the test utterances are no longer than 2 seconds and mostly shorter than 0.5 seconds, the proposed subregion modeling offered a 21.51% relative reduction in equal error rate (EER), compared with the standard GMM-UBM baseline. In addition, with the model synthesis approach, the performance can be greatly improved in scenarios where no enrollment data are available for some speech unit classes.

机译：由于注册和/或测试数据的限制，说话人短语音识别（SUSR）极具挑战性。我们认为，困难主要归因于用于训练通用背景模型（UBM）的语音数据以及用于注册和测试的语音数据的先验分布不匹配。本文提出了一种新颖的解决方案，可将语音信号分配到由语音单元定义的多个声学子区域中，并对子区域内的扬声器进行建模。为了避免数据稀疏性，提出了一种数据驱动的方法来将语音单元聚类为语音单元类，基于此可以构建健壮的子区域模型。此外，我们提出了一种基于最大似然线性回归（MLLR）的模型综合方法来处理无数据语音单元类别。实验在公开的数据库SUD12上进行。结果表明，在独立于文本的说话人识别任务中，测试话语不超过2秒且大多数情况下是小于0.5秒，相比于EER，拟议的子区域建模提供了21.51％的相对降低。标准GMM-UBM基准。此外，使用模型综合方法，在某些语言单元类别没有注册数据的情况下，可以大大提高性能。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2016年第6期|1129-1139|共11页
作者
Lantian Li; Dong Wang; Chenhao Zhang; Thomas Fang Zheng;
展开▼
作者单位

Center for Speech and Language Technologies, Tsinghua National Laboratory for Information Science and Technology, Tsinghua University, Beijing, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Model Synthesis; Short Utterance; Speaker Recognition; Subregion Model;

机译：模型综合;短话语;说话人识别;分区模型;

相似文献

外文文献
中文文献
专利

1. Speech Unit Category based Short Utterance Speaker Recognition [J] . Nakhat Fatima, Xiaojun Wu, Thomas Fang Zheng Computer Science and Information Systems . 2012,第4期

机译：基于语音单元类别的简短讲话者识别
2. Speech Unit Category based Short Utterance Speaker Recognition [J] . Nakhat Fatima, Xiaojun Wu, Thomas Fang Zheng Computer Science and Information Systems . 2012,第4期

机译：基于语音单元类别的简短讲话者识别
3. Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles [J] . Park Soo Jin, Yeung Gary, Vesselinova Neda, The Journal of the Acoustical Society of America . 2018,第1期

机译：了解人类和机器的扬声器歧视能力，为不同语音样式无关的短语
4. Improving Short Utterance based I-vector Speaker Recognition using Source and Utterance-Duration Normalization Techniques [C] . A. Kanagasundaram, D. Dean, J. Gonzalez-Dominguez, Conference of the International Speech Communication Association . 2013

机译：使用源和话语持续时间归一化技术改进基于短语的I - 矢量扬声器识别
5. Speech repairs, intonational boundaries and discourse markers: Modeling speakers' utterances in spoken dialog. [D] . Heeman, Peter Anthony. 1997

机译：语音修复，国际边界和话语标记：在语音对话中模拟说话者的话语。
6. Towards understanding speaker discrimination abilities in humans and machines for text-independent short utterances of different speech styles [O] . Soo Jin Park, Gary Yeung, Neda Vesselinova, -1

机译：旨在理解人和机器中说话者的辨别能力以实现不同语音风格的与文本无关的简短发声
7. Improving short utterance based I-vector speaker recognition using source and utterance-duration normalization techniques [O] . Kanagasundaram Ahilan, Dean David, Gonzalez-Dominguez Javier, 2013

机译：使用源和话语持续时间归一化技术改进基于短话语的I矢量说话人识别
8. Speaker Recognition from an Unknown Utterance and Speaker-Speech Interaction. [R] . Kashyap, R. L. 1976

机译：来自未知话语和说话者 - 语音交互的说话人识别。

Improving Short Utterance Speaker Recognition by Modeling Speech Unit Classes

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅