Multi-resolution time frequency feature and complementary combination for short utterance speaker recognition

Zhi-Yi Li; Wei-Qiang Zhang; Jia Liu

首页> 外文期刊>Multimedia Tools and Applications >Multi-resolution time frequency feature and complementary combination for short utterance speaker recognition

【24h】

Multi-resolution time frequency feature and complementary combination for short utterance speaker recognition

机译：多分辨率时频功能和互补组合，可实现短发性说话人识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A human speaker recognition expert often observes the speech spectrogram in multiple different scales for speaker recognition, especially under the short utterance condition. Inspired by this action, this paper proposes a novel multi-resolution time frequency feature (MRTF) extraction method, which is obtained by performing a 2-Dimensional discrete cosine transform (DCT) in multi-scale on the time frequency spectrogram matrix and then selecting and combining to the final multi-scaled transformed elements. Compared to the traditional Mel-Frequency Cepstral Coefficient (MFCC) feature extraction, the proposed method can make better use of multi-resolution temporal-frequency information. Beyond this, we also proposed three complementary combination strategies of MFCC and MRTF: in feature level, in ⅰ-vector level and in score level. Comparing their performance. We found the best results are obtained by combination in ⅰ-vector level. In the three NIST 2008 Speaker Recognition Evaluation datasets, the proposed method is the most effective for improving the performance under short utterance than under long utterance. And after the combination, we can achieve an EER of 11.32 % and MinDCF of 0.054 in the 10sec-10sec trials on the male dataset, which is an absolute 3 % improvement of EER than the best reported result in this field.

机译：说话人识别专家经常观察语音频谱图以多种不同的尺度进行说话人识别，尤其是在短发声条件下。受此作用的启发，本文提出了一种新颖的多分辨率时频特征（MRTF）提取方法，该方法是通过对时频频谱图矩阵进行多尺度二维离散余弦变换（DCT）然后选择并结合到最终的多尺度转换元素。与传统的梅尔频率倒谱系数（MFCC）特征提取相比，该方法可以更好地利用多分辨率时频信息。除此之外，我们还提出了MFCC和MRTF的三种互补组合策略：在特征级别，在vector向量级别和在得分级别。比较他们的表现。我们发现最好的结果是通过在ⅰ-向量水平上的组合获得的。在三个NIST 2008说话人识别评估数据集中，所提出的方法对于提高短话语表现比长话语表现最有效。合并后，在男性数据集的10sec-10sec试验中，我们可以实现11.32％的EER和0.054的MinDCF，这比该领域的最佳报告结果绝对提高了3％。

著录项

来源
《Multimedia Tools and Applications》 |2015年第3期|937-953|共17页
作者
Zhi-Yi Li; Wei-Qiang Zhang; Jia Liu;
展开▼
作者单位

Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China;

Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China;

Tsinghua National Laboratory for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing 100084, China;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multi-resolution time frequency feature; Ⅰ-vector; Complementary combination; Speaker recognition; Short utterance;

机译：多分辨率时频功能;Ⅰ-载体;互补组合;说话人识别;短发;

相似文献

外文文献
中文文献
专利

1. Robust features for text-independent speaker recognition with short utterances [J] . Chakroun Rania, Frikha Mondher Neural computing & applications . 2020,第17期

机译：具有短语无关的扬声器识别的强大功能
2. Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition [J] . Kun-Ching Wang Sensors . 2015,第1期

机译：使用多分辨率纹理分析和声活动检测器进行实时语音情感识别的时频特征表示
3. Time Frequency Features for Automatic Speaker Recognition [J] . Hossein Marvi WSEAS Transactions on Communications . 2006,第12期

机译：自动识别说话者的时频功能
4. System combination for short utterance speaker recognition [C] . Lantian Li, Dong Wang, Xiaodong Zhang, Asia-Pacific Signal and Information Processing Association Annual Summit and Conference . 2016

机译：系统组合可用于短发声说话者识别
5. Speaker recognition using complementary information from vocal source and vocal tract. [D] . Zheng, Nengheng. 2006

机译：说话人识别使用来自声源和声道的补充信息。
6. Time-Frequency Feature Representation Using Multi-Resolution Texture Analysis and Acoustic Activity Detector for Real-Life Speech Emotion Recognition [O] . Kun-Ching Wang 2015

机译：使用多分辨率纹理分析和声活动检测器的时频特征表示用于现实生活中的语音情感识别
7. System Combination for Short Utterance Speaker Recognition [O] . Li, Lantian, Wang, Dong, Zhang, Xiaodong, 2016

机译：用于短话语说话人识别的系统组合
8. Speaker Recognition from an Unknown Utterance and Speaker-Speech Interaction. [R] . Kashyap, R. L. 1976

机译：来自未知话语和说话者 - 语音交互的说话人识别。

Multi-resolution time frequency feature and complementary combination for short utterance speaker recognition

摘要

著录项

相似文献

相关主题

期刊订阅