首页> 外文会议>International Conference on Speech and Computer >The Representation of Speech and Its Processing in the Human Brain and Deep Neural Networks
【24h】

The Representation of Speech and Its Processing in the Human Brain and Deep Neural Networks

机译:人脑和深层神经网络中语音的表示及其处理

获取原文

摘要

For most languages in the world and for speech that deviates from the standard pronunciation, not enough (annotated) speech data is available to train an automatic speech recognition (ASR) system. Moreover, human intervention is needed to adapt an ASR system to a new language or type of speech. Human listeners, on the other hand, are able to quickly adapt to nonstandard speech and can learn the sound categories of a new language without having been explicitly taught to do so. In this paper, I will present comparisons between human speech processing and deep neural network (DNN)-based ASR and will argue that the cross-fertilisation of the two research fields can provide valuable information for the development of ASR systems that can flexibly adapt to any type of speech in any language. Specifically, I present results of several experiments carried out on both human listeners and DNN-based ASR systems on the representation of speech and lexically-guided perceptual learning, i.e., the ability to adapt a sound category on the basis of new incoming information resulting in improved processing of subsequent speech. The results showed that DNNs appear to learn structures that humans use to process speech without being explicitly trained to do so, and that, similar to humans, DNN systems learn speaker-adapted phone category boundaries from a few labelled examples. These results are the first steps towards building human-speech processing inspired ASR systems that, similar to human listeners, can adjust flexibly and fast to all kinds of new speech.
机译:对于世界上大多数语言以及对于偏离标准发音的语音而言,没有足够的(带注释)语音数据可用于训练自动语音识别(ASR)系统。而且,需要人工干预以使ASR系统适应新的语言或语音类型。另一方面,人类的听众能够快速适应非标准语音,并且无需明确地学习就可以学习新语言的声音类别。在本文中,我将对人类语音处理和基于深度神经网络(DNN)的ASR进行比较,并认为这两个研究领域的交叉应用可以为ASR系统的开发提供有价值的信息,从而可以灵活地适应任何语言的任何类型的演讲。具体来说,我介绍了在人类听众和基于DNN的ASR系统上进行的关于语音表示和词汇指导的感知学习的几次实验的结果,即基于新的传入信息来适应声音类别的能力,改进了后续语音的处理。结果表明,DNN似乎是在学习人类用来处理语音的结构,而没有经过明确的训练,而DNN系统则像人类一样,从一些带有标签的示例中学习说话者适应的电话类别边界。这些结果是构建受人类语音处理启发的ASR系统的第一步,该系统类似于人类听众,可以灵活,快速地适应各种新语音。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号