...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Unsupervised Iterative Deep Learning of Speech Features and Acoustic Tokens with Applications to Spoken Term Detection
【24h】

Unsupervised Iterative Deep Learning of Speech Features and Acoustic Tokens with Applications to Spoken Term Detection

机译:语音特征和声学令牌的无监督迭代深度学习及其在口语检测中的应用

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we aim to automatically discover high-quality frame-level speech features and acoustic tokens directly from unlabeled speech data. A multigranular acoustic tokenizer (MAT) was proposed for automatic discovery of multiple sets of acoustic tokens from the given corpus. Each acoustic token set is specified by a set of hyperparameters describing the model configuration. These different sets of acoustic tokens carry different characteristics for the given corpus and the language behind and, thus, can be mutually reinforced. The multiple sets of token labels are then used as the targets of a multitarget deep neural network (MDNN) trained on frame-level acoustic features. Bottleneck features extracted from the MDNN are then used as the feedback input to the MAT and the MDNN itself in the next iteration. The multigranular acoustic token sets and the frame-level speech features can be iteratively optimized in the iterative deep learning framework. We call this framework the MAT deep neural network. The results were evaluated using the metrics and corpora defined in the Zero Resource Speech Challenge organized at Interspeech 2015, and improved performance was obtained with a set of experiments of query-by-example spoken term detection on the same corpora. Visualization for the discovered tokens against the English phonemes was also shown.
机译:本文旨在直接从未标记的语音数据中自动发现高质量的帧级语音特征和声学标记。提出了一种多颗粒声令牌器(MAT),用于从给定语料库中自动发现多组声令牌。每个声学标记集由描述模型配置的一组超参数指定。对于给定的语料库和背后的语言,这些不同的声学标记集具有不同的特征,因此可以相互加强。然后,将多组标记标签用作经过帧级声学特征训练的多目标深层神经网络(MDNN)的目标。然后将从MDNN中提取的瓶颈特征用作下一次迭代中MAT和MDNN本身的反馈输入。可以在迭代深度学习框架中迭代优化多颗粒声学令牌集和帧级语音特征。我们将此框架称为MAT深度神经网络。使用在Interspeech 2015上组织的“零资源语音挑战”中定义的指标和语料库对结果进行了评估,并通过在同一语料库上通过示例查询口语项检测的一组实验获得了改进的性能。还显示了针对英文音素的发现标记的可视化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号