首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification
【24h】

Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification

机译:基于时间对比学习的深层瓶颈功能,用于文本相关的说话人验证

获取原文
获取原文并翻译 | 示例

摘要

There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs) trained to discriminate speakers, pass-phrases, and triphone states for improving the performance of text-dependent speaker verification (TD-SV). However, a moderate success has been achieved. A recent study presented a time contrastive learning (TCL) concept to explore the non-stationarity of brain signals for classification of brain states. Speech signals have similar non-stationarity property, and TCL further has the advantage of having no need for labeled data. We therefore present a TCL based BN feature extraction method. The method uniformly partitions each speech utterance in a training dataset into a predefined number of multi-frame segments. Each segment in an utterance corresponds to one class, and class labels are shared across utterances. DNNs are then trained to discriminate all speech frames among the classes to exploit the temporal structure of speech. In addition, we propose a segment-based unsupervised clustering algorithm to re-assign class labels to the segments. TD-SV experiments were conducted on the RedDots challenge database. The TCL-DNNs were trained using speech data of fixed pass-phrases that were excluded from the TD-SV evaluation set, so the learned features can be considered phrase-independent. We compare the performance of the proposed TCL BN feature with those of short-time cepstral features and BN features extracted from DNNs discriminating speakers, pass-phrases, speaker+pass-phrase, as well as monophones whose labels and boundaries are generated by three different automatic speech recognition (ASR) systems. Experimental results show that the proposed TCL-BN outperforms cepstral features and speaker+pass-phrase discriminant BN features, and its performance is on par with those of ASR derived BN features. Moreover, the clustering method improves the TD-SV performance of TCL-BN and ASR derived BN features with respect to their standalone counterparts. We further study the TD-SV performance of fusing cepstral and BN features.
机译:有许多关于从深度神经网络(DNN)中提取瓶颈(BN)功能的研究,这些神经网络经过训练可区分说话者,通行短语和三音状态,从而提高基于文本的说话者验证(TD-SV)的性能。但是,已经取得了一定程度的成功。最近的一项研究提出了时间对比学习(TCL)概念,以探索脑信号分类的非平稳性。语音信号具有类似的非平稳性,TCL的另一个优点是不需要标记数据。因此,我们提出了一种基于TCL的BN特征提取方法。该方法将训练数据集中的每个语音话语均匀地划分为预定数量的多帧段。话语中的每个句段对应一个类别,并且类别标签在所有话语之间共享。然后训练DNN来区分类别中的所有语音帧,以开发语音的时间结构。另外,我们提出了一种基于段的无监督聚类算法,将类标签重新分配给这些段。 TD-SV实验在RedDots挑战数据库上进行。使用固定密码短语的语音数据训练了TCL-DNN,这些语音短语已从TD-SV评估集中排除,因此可以将学习到的功能视为与短语无关。我们比较了拟议的TCL BN功能的性能与从DNN中提取的短时倒谱功能和BN功能的性能,以区分说话者,通行短语,说话者+通行短语以及其标签和边界由三种不同方式生成的单音手机自动语音识别(ASR)系统。实验结果表明,提出的TCL-BN优于倒谱特征和说话人+通行短语判别性BN特征,其性能与ASR衍生的BN特征相当。此外,相对于它们的独立副本,聚类方法提高了TCL-BN和ASR衍生的BN功能的TD-SV性能。我们进一步研究了融合倒谱和BN特征的TD-SV性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号