Cross-Lingual Automatic Speech Recognition Using Tandem Features

Lal P.; King S.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Cross-Lingual Automatic Speech Recognition Using Tandem Features

【24h】

Cross-Lingual Automatic Speech Recognition Using Tandem Features

机译：使用串联功能的跨语言自动语音识别

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Automatic speech recognition depends on large amounts of transcribed speech recordings in order to estimate the parameters of the acoustic model. Recording such large speech corpora is time-consuming and expensive; as a result, sufficient quantities of data exist only for a handful of languages—there are many more languages for which little or no data exist. Given that there are acoustic similarities between speech in different languages, it may be fruitful to use data from a well-resourced source language to estimate the acoustic models for a recognizer in a poorly-resourced target language. Previous approaches to this task have often involved making assumptions about shared phonetic inventories between the languages. Unfortunately pairs of languages do not generally share a common phonetic inventory. We propose an indirect way of transferring information from a source language acoustic model to a target language acoustic model without having to make any assumptions about the phonetic inventory overlap. To do this, we employ tandem features, in which class-posteriors from a separate classifier are decorrelated and appended to conventional acoustic features. Tandem features have the advantage that the language of the speech data used to train the classifier need not be the same as the target language to be recognized. This is because the class-posteriors are not used directly, so do not have to be over any particular set of classes. We demonstrate the use of tandem features in cross-lingual settings, including training on one or several source languages. We also examine factors which may predict a priori how much relative improvement will be brought about by using such tandem features, for a given source and target pair. In addition to conventional phoneme class-posteriors, we also investigate whether articulatory features (AFs)—a multi-stream, discrete, multi-valued labeling of speech—can be used instead. This is motivated by an assumption - hat AFs are less language-specific than a phoneme set.

机译：自动语音识别取决于大量转录的语音记录，以便估计声学模型的参数。录制如此大的语音语料库既耗时又昂贵。结果，仅对于少数几种语言就存在足够数量的数据-还有更多种语言，这些语言很少或根本没有数据。假设不同语言的语音之间存在声学相似性，那么使用来自资源丰富的源语言的数据来估计资源匮乏的目标语言中的识别器的声学模型可能会富有成果。用于此任务的先前方法通常涉及对语言之间共享的语音清单进行假设。不幸的是，成对的语言通常不会共享相同的语音清单。我们提出了一种间接的方式，可以将信息从源语言声学模型转移到目标语言声学模型，而不必对语音库进行任何假设重叠。为此，我们采用了串联特征，其中来自单独分类器的类后验是去相关的，并附加到常规声学特征上。串联特征的优点在于，用于训练分类器的语音数据的语言不必与要识别的目标语言相同。这是因为不直接使用class-postteriors，因此不必遍历任何特定的class集。我们演示了在跨语言环境中使用串联功能的情况，包括对一种或几种源语言的培训。对于给定的源和目标对，我们还研究了可以预测先验因素的因素，这些因素可通过使用此类串联特性带来多少相对改善。除了常规的音素类后验之外，我们还研究是否可以代替使用发音特征（AF）（语音的多流，离散，多值标记）。这是由一个假设引起的-帽子AF的语言特定性不如音素组。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2013年第12期|2506-2515|共10页
作者
Lal P.; King S.;
展开▼
作者单位

School of Informatics, University of Edinburgh, Edinburgh, United Kingdom|c|;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Automatic speech recognition; multilayer perceptrons;

机译：自动语音识别;多层感知器;

相似文献

外文文献
中文文献
专利

1. Evaluating automatic speech recognition systems as quantitative models of cross-lingual phonetic category perception [J] . Schatz Thomas, Bach Francis, Dupoux Emmanuel The Journal of the Acoustical Society of America . 2018,第5期

机译：评估自动语音识别系统作为交叉语音类别感知的定量模型
2. New Speech Features Based on time-varying LPC for Robust Automatic Speech Recognition [J] . George MUFUNGULWA, Alia ASHERALIEVA, Hiroshi TSUTSUI, 電子情報通信学会技術研究報告. スマートインフォメディアシステム . 2016,第81期

机译：基于时变LPC的新语音功能可实现鲁棒的自动语音识别
3. Topological invariants as speech features for automatic speech recognition [J] . Juraj Kacur, Vladimir Chudy International Journal of Signal and Imaging Systems Engineering . 2014,第4期

机译：拓扑不变性作为语音特征，用于自动语音识别
4. A Preliminary Study of Cross-lingual Emotion Recognition from Speech: Automatic Classification versus Human Perception [C] . Je Hun Jeon, Due Le, Rui Xia, Conference of the International Speech Communication Association . 2013

机译：言语交叉情绪识别的初步研究：自动分类与人类感知
5. Learning Feature Representation for Automatic Speech Recognition [D] . Ghahremani, Pegah. 2019

机译：自动语音识别学习功能表示
6. DWT features performance analysis for automatic speech recognition of Urdu [O] . Hazrat Ali, Nasir Ahmad, Xianwei Zhou, -1

机译：DWT具有性能分析功能可对乌尔都语进行自动语音识别
7. Cross-Lingual Automatic Speech Recognition Using Tandem Features [O] . Lal, Partha, King, Simon 2013

机译：使用串联特征的跨语言自动语音识别

Cross-Lingual Automatic Speech Recognition Using Tandem Features

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅