Rank-based frame classification for usable speech detection in speaker identification systems

机译：用于说话人识别系统中可用语音检测的基于等级的帧分类

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The performance of a speaker identification (SID) system degrades substantially when there is a mismatch between the training and testing conditions. Discriminating between temporal sections of speech signals which are speech-like (SID usable) and noise-like (SID unusable) while only retaining frames labeled SID usable can augment SID performance substantially. In this paper, a novel labeling system for SID usable and SID unusable frames is presented for a GMM based SID system. This is motivated by a control experiment demonstrating that very high SID accuracies are theoretically achievable by removing frames that contribute more to the scores of competing speakers rather than the true speaker. To blindly identify these SID usable and unusable frames, the Mahalanobis distance and an ensemble of decision tree classifiers (with boosting) were trained on a dataset which was different from the enrollment database for the SID system. The classifier based techniques yielded improvements over the base speaker identification system (all frames used) in all cases when the speech signal was corrupted with additive white or additive pink noise.

机译：当训练条件与测试条件不匹配时，说话者识别（SID）系统的性能将大大降低。在仅保留标记为SID可用的帧的同时，区分语音信号的类似于语音（SID可用）和噪声（SID不可用）的时间部分可以大大提高SID性能。在本文中，针对基于GMM的SID系统，提出了一种新颖的SID可用和SID不可用帧标记系统。这是由一个控制实验所激发的，该实验表明，从理论上讲，通过删除对竞争说话者而不是真实说话者的得分贡献更大的帧，可以实现很高的SID精度。为了盲目地识别这些SID可用和不可用的帧，在不同于SID系统的注册数据库的数据集上训练了Mahalanobis距离和决策树分类器（带有增强）的整体。当语音信号被加性白色或加性粉红色噪声破坏时，基于分类器的技术在所有情况下均优于基本说话人识别系统（使用所有帧）。

著录项

来源
《International Conference on Digital Signal Processing》|2015年|292-296|共5页
会议地点
作者
Ethridge James; Ramachandran Ravi P.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Gaussian mixture model; Mahalanobis distance; additive noise; boosting; decision tree; speaker identification; usable frames;

机译：高斯混合模型;马哈拉诺比斯距离;加性噪声;增强;决策树;说话人识别;可用帧;

相似文献

外文文献
中文文献
专利

1. On Usable Speech Detection by Linear Multi-Scale Decomposition for Speaker Identification [J] . Wajdi Ghezaiel, Amel Ben Slimane, Ezzedine Ben Braiek International Journal of Electrical and Computer Engineering . 2016,第6期

机译：基于线性多尺度分解的可用语音检测用于说话人识别
2. A Speech-and-Speaker Identification System: Feature Extraction, Description, and Classification of Speech-Signal Image [J] . Khalid Saeed, Mohammad Kheir Nammous IEEE Transactions on Industrial Electronics . 2007,第期

机译：语音识别系统：语音信号图像的特征提取，描述和分类
3. An Investigation on the Accuracy of Truncated DKLT Representation for Speaker Identification With Short Sequences of Speech Frames [J] . Giorgio Biagetti, Paolo Crippa, Laura Falaschetti, Cybernetics, IEEE Transactions on . 2017,第12期

机译：短序列语音帧的说话人识别中截断的DKLT表示精度的研究
4. Rank-based frame classification for usable speech detection in speaker identification systems [C] . Ethridge James, Ramachandran Ravi P. International Conference on Digital Signal Processing . 2015

机译：扬声器识别系统中可用语音检测的基于秩的帧分类
5. Usable speech processing: A filterless approach to speaker identification in the presence of non-stationary interference. [D] . Smolenski, Brett Y. 2005

机译：可用的语音处理：在存在非平稳干扰的情况下，一种无滤波器的说话人识别方法。
6. Intoxicated Speech Detection: A Fusion Framework with Speaker-Normalized Hierarchical Functionals and GMM Supervectors [O] . Daniel Bone, Ming Li, Matthew P. Black, -1

机译：陶醉的语音检测：具有扬声器归一化分层功能和GMM运行的融合框架
7. On Usable Speech Detection by Linear Multi-Scale Decomposition for Speaker Identification [O] . Wajdi Ghezaiel, Amel Ben Slimane, Ezzedine Ben Braiek 2016

机译：关于扬声器识别线性多尺度分解的可用语音检测
8. Effect of Reference Set Selection on Speaker Dependent Speech Recognition. Frame Compression in Isolated Word Recognition [R] . Li, Z., Alleva, F., Reddy, R. 1981

机译：参考集选择对说话人相关语音识别的影响。孤立词识别中的帧压缩

Rank-based frame classification for usable speech detection in speaker identification systems

摘要

著录项

相似文献

相关主题

期刊订阅