首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Supervised and unsupervised active learning for automatic speech recognition of low-resource languages
【24h】

Supervised and unsupervised active learning for automatic speech recognition of low-resource languages

机译:有监督和无监督的主动学习,可自动识别低资源语言的语音

获取原文
获取外文期刊封面目录资料

摘要

Automatic speech recognition (ASR) systems rely on large quantities of transcribed acoustic data. The collection of audio data is relatively cheap, whereas the transcription of that data is relatively expensive. Thus there is an interest in the ASR community in active learning, in which only a small subset of highly representative data chosen from a large pool of untranscribed audio need be transcribed in order to approach the performance of the system trained with much larger amounts of transcribed audio. In this paper, we compare two basic approaches to active learning: a supervised approach in which we build a speech recognition system from a small amount of seed data in order to make the selection of a limited amount of additional audio for transcription, and an unsupervised approach in which no intermediate system recognition system built from seed data is necessary. Our best unsupervised approach performs quite close to our supervised approach, with both outperforming a random selection scheme.
机译:自动语音识别(ASR)系统依赖于大量转录的声学数据。音频数据的收集相对便宜,而该数据的转录则相对昂贵。因此,ASR社区对主动学习产生了兴趣,在这种学习中,仅转录从大量未转录音频中选择的极具代表性的数据的一小部分,即可达到通过大量转录而训练的系统的性能声音的。在本文中,我们比较了主动学习的两种基本方法:一种有监督的方法,其中我们从少量的种子数据中构建了一个语音识别系统,以便选择数量有限的其他音频进行转录;以及一种无监督的方法。这种方法不需要由种子数据构建的中间系统识别系统。我们最好的无监督方法在性能上与我们的监督方法非常接近,两者均优于随机选择方案。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号