Unsupervised Iterative Deep Learning of Speech Features and Acoustic Tokens with Applications to Spoken Term Detection

Cheng-Tao Chung; Cheng-Yu Tsai; Chia-Hsiang Liu; Lin-Shan Lee

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Unsupervised Iterative Deep Learning of Speech Features and Acoustic Tokens with Applications to Spoken Term Detection

【24h】

Unsupervised Iterative Deep Learning of Speech Features and Acoustic Tokens with Applications to Spoken Term Detection

机译：语音特征和声学令牌的无监督迭代深度学习及其在口语检测中的应用

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, we aim to automatically discover high-quality frame-level speech features and acoustic tokens directly from unlabeled speech data. A multigranular acoustic tokenizer (MAT) was proposed for automatic discovery of multiple sets of acoustic tokens from the given corpus. Each acoustic token set is specified by a set of hyperparameters describing the model configuration. These different sets of acoustic tokens carry different characteristics for the given corpus and the language behind and, thus, can be mutually reinforced. The multiple sets of token labels are then used as the targets of a multitarget deep neural network (MDNN) trained on frame-level acoustic features. Bottleneck features extracted from the MDNN are then used as the feedback input to the MAT and the MDNN itself in the next iteration. The multigranular acoustic token sets and the frame-level speech features can be iteratively optimized in the iterative deep learning framework. We call this framework the MAT deep neural network. The results were evaluated using the metrics and corpora defined in the Zero Resource Speech Challenge organized at Interspeech 2015, and improved performance was obtained with a set of experiments of query-by-example spoken term detection on the same corpora. Visualization for the discovered tokens against the English phonemes was also shown.

机译：本文旨在直接从未标记的语音数据中自动发现高质量的帧级语音特征和声学标记。提出了一种多颗粒声令牌器（MAT），用于从给定语料库中自动发现多组声令牌。每个声学标记集由描述模型配置的一组超参数指定。对于给定的语料库和背后的语言，这些不同的声学标记集具有不同的特征，因此可以相互加强。然后，将多组标记标签用作经过帧级声学特征训练的多目标深层神经网络（MDNN）的目标。然后将从MDNN中提取的瓶颈特征用作下一次迭代中MAT和MDNN本身的反馈输入。可以在迭代深度学习框架中迭代优化多颗粒声学令牌集和帧级语音特征。我们将此框架称为MAT深度神经网络。使用在Interspeech 2015上组织的“零资源语音挑战”中定义的指标和语料库对结果进行了评估，并通过在同一语料库上通过示例查询口语项检测的一组实验获得了改进的性能。还显示了针对英文音素的发现标记的可视化。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2017年第10期|1914-1928|共15页
作者
Cheng-Tao Chung; Cheng-Yu Tsai; Chia-Hsiang Liu; Lin-Shan Lee;
展开▼
作者单位

Graduate Institute of Electrical Engineering, National Taiwan University, Taipei, Taiwan;

Graduate Institute of Communication Engineering, National Taiwan University, Taipei, Taiwan;

Graduate Institute of Electrical Engineering, National Taiwan University, Taipei, Taiwan;

Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Acoustics; Hidden Markov models; Speech; Feature extraction; Neural networks; Speech processing; Measurement;

机译：声学;隐马尔可夫模型;语音;特征提取;神经网络;语音处理;测量;

相似文献

外文文献
中文文献
专利

1. Model-Based Unsupervised Spoken Term Detection with Spoken Queries [J] . Chan C.-A., Lee L.-S. Audio, Speech, and Language Processing, IEEE Transactions on . 2013,第7期

机译：具有语音查询的基于模型的无监督语音术语检测
2. Detection of particle contaminants in rolling element bearings with unsupervised acoustic emission feature learning [J] . Martin-del-Campo S., Schnabel S., Sandin F., Tribology International . 2019,第期

机译：无监督声发射特征学习的滚动元件轴承粒子污染物的检测
3. Combining iterative slow feature analysis and deep feature learning for change detection in high-resolution remote sensing images [J] . Xu Junfeng, Zhang Baoming, Guo Haitao, Journal of Applied Remote Sensing . 2019,第2期

机译：结合迭代缓慢特征分析和深度特征学习改变检测在高分辨率遥感图像中
4. An iterative deep learning framework for unsupervised discovery of speech features and linguistic units with applications on spoken term detection [C] . Cheng-Tao Chung, Cheng-Yu Tsai, Hsiang-Hung Lu, IEEE Workshop on Automatic Speech Recognition and Understanding . 2015

机译：迭代深度学习框架，可无监督地发现语音特征和语言单元，并在口语术语检测中得到应用
5. Discriminative Articulatory Feature-based Pronunciation Models with Application to Spoken Term Detection [D] . Prabhavalkar, Rohit. 2013

机译：基于区分性发音特征的语音模型及其在口语检测中的应用
6. Acoustic and Language Based Deep Learning Approaches for Alzheimers Dementia Detection From Spontaneous Speech [O] . Pranav Mahajan, Veeky Baths 2021

机译：基于声学和语言的Alzheimer对自发性言论检测的深度学习方法
7. Unsupervised Iterative Deep Learning of Speech Features and Acoustic Tokens with Applications to Spoken Term Detection [O] . Chung, Cheng-Tao, Tsai, Cheng-Yu, Liu, Chia-Hsiang, 2017

机译：语音特征和声学的无监督迭代深度学习带语音检测应用的标记

Unsupervised Iterative Deep Learning of Speech Features and Acoustic Tokens with Applications to Spoken Term Detection

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅