Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification

Li Ming; Liu Lun; Cai Weicheng; Liu Wenbo

首页> 外文期刊>Journal of signal processing systems for signal, image, and video technology >Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification

【24h】

Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification

机译：具有语音分词和串联特性的通用I向量表示，可用于文本无关和文本相关的说话人验证

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper presents a generalized i-vector representation framework with phonetic tokenization and tandem features for text independent as well as text dependent speaker verification. In the conventional i-vector framework, the tokens for calculating the zero-order and first-order Baum-Welch statistics are Gaussian Mixture Model (GMM) components trained from acoustic level MFCC features. Yet besides MFCC, we believe that phonetic information makes another direction that can benefit the system performance. Our contribution in this paper lies in integrating phonetic information into the i-vector representation by several extensions, forming a more generalized i-vector framework. First, the tokens for calculating the zero-order statistics is extended from the MFCC trained GMM components to phonetic phonemes, trigrams and tandem feature trained GMM components, using phoneme posterior probabilities. Second, given the zero-order statistics (posterior probabilities on tokens), the feature used to calculate the first-order statistics is also extended from MFCC to tandem feature, and is not necessarily the same feature employed by the tokenizer. Third, the zero-order and first-order statistics vectors are then concatenated and represented by the simplified supervised i-vector approach followed by the standard Probabilistic Linear Discriminant Analysis (PLDA) back-end. We study different token and feature combinations, and we show that the feature level fusion of acoustic level MFCC features and phonetic level tandem features with GMM based i-vector representation achieves the best performance for text independent speaker verification. Furthermore, we demonstrate that the phonetic level phoneme constraints introduced by the tandem features help the text dependent speaker verification system to reject wrong password trials and improve the performance dramatically. Experimental results are reported on the NIST SRE 2010 common condition 5 female part task and the RSR 2015 part 1 female part task for text independent and text dependent speaker verification, respectively. For the text independent speaker verification task, the proposed generalized i-vector representation outperforms the i-vector baseline by relatively 53 % in terms of equal error rate (EER) and norm minDCF values. For the text dependent speaker verification task, our proposed approach also reduced the EER significantly from 23 % to 90 % relatively for different types of trials.

机译：本文提出了一种通用的i-vector表示框架，该框架具有语音标记化和串联功能，可用于文本独立以及与文本相关的说话人验证。在传统的i-vector框架中，用于计算零阶和一阶Baum-Welch统计数据的标记是从声学级MFCC特征训练出来的高斯混合模型（GMM）组件。然而，除了MFCC，我们相信语音信息将为使系统性能受益的另一个方向。我们在本文中的贡献在于通过几个扩展将语音信息集成到i-vector表示中，从而形成了更通用的i-vector框架。首先，使用音素后验概率，将用于计算零阶统计量的令牌从MFCC训练的GMM组件扩展到语音音素，三字母组和串联特征训练的GMM组件。其次，给定零阶统计量（令牌的后验概率），用于计算一阶统计量的功能也从MFCC扩展到了串联特征，并且不一定与令牌化程序所采用的特征相同。第三，然后将零阶和一阶统计向量连接起来，并通过简化的监督i-向量方法和标准概率线性判别分析（PLDA）后端进行表示。我们研究了不同的标记和特征组合，并且我们表明，声级MFCC特征和语音级串联特征与基于GMM的i-vector表示的特征级融合可实现最佳的文本无关说话者验证性能。此外，我们证明了串接功能引入的语音级别音素约束可帮助依赖文本的说话者验证系统拒绝错误的密码尝试并显着提高性能。针对NIST SRE 2010通用条件5女性部分任务和RSR 2015 part 1女性部分任务分别报告了独立于文本和依赖于文本的说话者验证的实验结果。对于独立于文本的说话人验证任务，就等误码率（EER）和标准minDCF值而言，建议的广义i向量表示比i向量基线要高出53％。对于依赖文本的说话人验证任务，我们建议的方法还可以将不同类型的试验的EER从23％显着降低到90％。

著录项

来源
《Journal of signal processing systems for signal, image, and video technology》 |2016年第2期|207-215|共9页
作者
Li Ming; Liu Lun; Cai Weicheng; Liu Wenbo;
展开▼
作者单位

Sun Yat Sen Univ, SYSU CMU Joint Inst Engn, Guangzhou, Guangdong, Peoples R China|SYSU CMU Shunde Int Joint Res Inst, Guangzhou, Guangdong, Peoples R China;

Sun Yat Sen Univ, Sch Mobile Informat Engn, Guangzhou, Guangdong, Peoples R China;

Sun Yat Sen Univ, Sch Informat Sci & Technol, Guangzhou, Guangdong, Peoples R China;

Sun Yat Sen Univ, SYSU CMU Joint Inst Engn, Guangzhou, Guangdong, Peoples R China|Carnegie Mellon Univ, Dept ECE, Pittsburgh, PA 15213 USA;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Text independent speaker verification; Text dependent speaker verification; I-vector; Tandem feature; Language identification;

机译：独立于文本的说话者验证;基于文本的说话者验证;I-vector;串联功能;语言识别;

相似文献

外文文献
中文文献
专利

1. Speaker-Phonetic I-Vector Modeling for Text-Dependent Speaker Verification with Random Digit Strings [J] . Shengyu YAO, Ruohua ZHOU, Pengyuan ZHANG IEICE transactions on information and systems . 2019,第2期

机译：带有随机数字字符串的文本相关说话人验证的说话人语音I矢量建模
2. A fuzzy-clustering-based hierarchical i-vector/probabilistic inear discriminant analysis system for text-dependent speaker verification [J] . Laskar Mohammad Azharuddin, Laskar Rabul Hussain Expert Systems . 2020,第3期

机译：基于模糊聚类的分层I载体/概率INEAR判别分析分析系统，用于文本依赖扬声器验证
3. Text-dependent speaker verification based on i-vectors, Neural Networks and Hidden Markov Models [J] . Zeinali Hossein, Sameti Hossein, Burget Lukáš, Computer speech and language . 2017,第nova期

机译：基于i向量，神经网络和隐马尔可夫模型的文本相关说话人验证
4. Deep bottleneck features for i-vector based text-independent speaker verification [C] . Sina Hamidi Ghalehjegh, Richard C. Rose IEEE Workshop on Automatic Speech Recognition and Understanding . 2015

机译：基于i向量的独立文本说话者验证的深层瓶颈功能
5. Speaker adaptation in joint factor analysis based text independent speaker verification [D] . Shou-Chun, Yin 2007

机译：基于联合因素分析的文本自适应说话人验证中的说话人适应
6. Bidirectional Attention for Text-Dependent Speaker Verification [O] . Xin Fang, Tian Gao, Liang Zou, 2020

机译：文本依赖扬声器验证的双向关注
7. THE ROLE OF DYNAMIC FEATURES IN TEXT-DEPENDENT AND-INDEPENDENT SPEAKER VERIFICATION [O] . Ying Liu, Martin Russell, Michael Carey 2014

机译：动态特征在文本依赖和独立的语音验证中的作用

Generalized I-vector Representation with Phonetic Tokenizations and Tandem Features for both Text Independent and Text Dependent Speaker Verification

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅