Semi-supervised Acoustic and Language Model Training for English-isiZulu Code-Switched Speech Recognition

机译：英文-伊祖鲁语码转换语音识别的半监督声学和语言模型训练

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We present an analysis of semi-supervised acoustic and language model training for English-isiZulu code-switched ASR using soap opera speech. Approximately 11 hours of untranscribed multilingual speech was transcribed automatically using four bilingual code-switching transcription systems operating in English-isiZulu, English-isiXhosa, English-Setswana and English-Sesotho. These transcriptions were incorporated into the acoustic and language model training sets. Results showed that the TDNN-F acoustic models benefit from the additional semi-supervised data and that even better performance could be achieved by including additional CNN layers. Using these CNN-TDNN-F acoustic models, a first iteration of semi-supervised training achieved an absolute mixed-language WER reduction of 3.4%, and a further 2.2% after a second iteration. Although the languages in the untranscribed data were unknown, the best results were obtained when all automatically transcribed data was used for training and not just the utterances classified as English-isiZulu. Despite reducing perplexity, the semi-supervised language model was not able to improve the ASR performance.

机译：我们介绍了使用肥皂剧语音对英语-isiZulu代码转换的ASR进行半监督的声学和语言模型训练的分析。使用四个以英语-isiZulu，英语-isiXhosa，英语-Setswana和英语-Sesotho运行的双语代码转换转录系统，自动转录了大约11个小时的未转录多语言语音。这些转录被并入声学和语言模型训练集中。结果表明，TDNN-F声学模型受益于附加的半监督数据，并且通过包含附加的CNN层，甚至可以实现更好的性能。使用这些CNN-TDNN-F声学模型，半监督训练的第一次迭代实现了绝对混合语言WER降低3.4％，第二次迭代后进一步降低了2.2％。尽管未转录数据中的语言是未知的，但是当所有自动转录的数据都用于训练而不仅仅是分类为English-isiZulu的语音时，可以获得最佳结果。尽管减少了困惑，但半监督语言模型仍无法提高ASR性能。

著录项

来源
《Workshop on Computational Approaches to Code Switching》|2020年|52-56|共5页
会议地点
作者
Astik Biswas; Febe de Wet; Ewald van der Westhuizen; Thomas Niesler;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
code-switched speech; under-resourced languages; semi-supervised training; TDNN; CNN;

机译：代码转换语音;资源不足的语言;半监督训练; TDNN;有线电视新闻网;
入库时间 2022-08-26 13:54:23

相似文献

外文文献
中文文献
专利

1. Automatic Speech Recognition of English-isiZulu Code-switched Speech from South African Soap Operas [J] . Ewald van der Westhuizen, Thomas Niesler Procedia Computer Science . 2016,第22期

机译：南非肥皂剧中英语-西祖鲁语代码转换语音的自动语音识别
2. Code-switched automatic speech recognition in five South African languages [J] . Astik Biswas, Emre Yilmaz, Ewald van der Westhuizen, Computer speech and language . 2022,第Jana期

机译：五种南非语言中的代码切换自动语音识别
3. An Improved Framework for Recognizing Highly Imbalanced Bilingual Code-Switched Lectures with Cross-Language Acoustic Modeling and Frame-Level Language Identification [J] . Yeh Ching-Feng, Lee Lin-Shan Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2015,第7期

机译：跨语言声学建模和框架级语言识别的高度识别双语代码转换演讲的改进框架
4. Semi-supervised Acoustic Modelling for Five-lingual Code-switched ASR using Automatically-segmented Soap Opera Speech [C] . Nick Wilkinson, Astik Biswas, Emre Yilmaz, Joint Spoken Language Technolologies for Under-resourcd Languages and Collaboration and Computing for Under-Resourced Languages Workshop . 2020

机译：使用自动分段的肥皂剧语音进行五语言代码转换ASR的半监督声学建模
5. Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition. [D] . Liu, Yuzong. 2016

机译：用于自动语音识别的声学建模中基于图的半监督学习。
6. Retrospective Analysis of Clinical Performance of an Estonian Speech Recognition System for Radiology: Effects of Different Acoustic and Language Models [O] . A. Paats, T. Alumäe, E. Meister, 2018

机译：一项爱沙尼亚放射线语音识别系统临床表现的回顾性分析：不同声学和语言模型的影响
7. Automatic Speech Recognition of English-isiZulu Code-switched Speech from South African Soap Operas [O] . van der Westhuizen Ewald, Niesler Thomas 2016

机译：南非肥皂剧中英语-西祖鲁语代码转换语音的自动语音识别

Semi-supervised Acoustic and Language Model Training for English-isiZulu Code-Switched Speech Recognition

摘要

著录项

相似文献

相关主题

期刊订阅