...
首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Crosslingual and Multilingual Speech Recognition Based on the Speech Manifold
【24h】

Crosslingual and Multilingual Speech Recognition Based on the Speech Manifold

机译:基于语音流形的跨语言和多语言语音识别

获取原文
获取原文并翻译 | 示例
           

摘要

Speech signals are produced by the smooth and continuous movements of the human articulators. An articulatory representation of speech is considered to be a more compact, more universal, and language-independent speech feature space and can, therefore, improve crosslingual and multilingual speech recognition systems, especially when porting components from one language to another in low-resource scenarios. However, learning the acoustic-to-articulatory conversion has proven to be a very challenging task. In this paper, we utilize a manifold learning technique to derive a nonlinear feature transformation from the conventional filterbank feature space to an articulatory-like feature space. The coordinates in the resultant representation of which some have demonstrable phonological meaning are shown to be highly portable across languages. We propose a proper framework in terms of data selection and graph construction to train coordinates from multilingual data, which allows for training the coordinate space when we have abundant out-of-language data. Deep neural network (DNN) bottleneck features are demonstrated to exhibit a greater degree of language independence when using this representation than in the case of filterbank features as inputs. The usability of this representation is further demonstrated in a number of speech recognition experiments using DNNs in a variety of crosslingual and multilingual scenarios using the multilingual GlobalPhone dataset. Especially, speech recognition systems developed in low-resource settings profit from the improved portability across languages.
机译:语音信号是由人类发音器的平稳连续运动产生的。语音的语音表示被认为是更紧凑,更通用且与语言无关的语音特征空间,因此可以改善跨语言和多语言的语音识别系统,尤其是在资源匮乏的情况下将组件从一种语言移植到另一种语言时。然而,事实证明,学习声音到发音的转换是一项非常艰巨的任务。在本文中,我们利用流形学习技术将非线性特征转换从传统的滤波器组特征空间转换为类似关节的特征空间。结果表示中的坐标(其中一些坐标具有明显的语音学意义)被显示为可跨语言移植。我们在数据选择和图形构建方面提出了一个合适的框架,以训练来自多语言数据的坐标,这可以在我们拥有大量非语言数据时训练坐标空间。与使用滤波器组功能作为输入的情况相比,使用此表示形式时,深层神经网络(DNN)瓶颈功能表现出更高的语言独立性。在使用多语言GlobalPhone数据集的各种跨语言和多语言场景中,使用DNN进行的大量语音识别实验进一步证明了该表示的可用性。尤其是,在资源较少的情况下开发的语音识别系统受益于跨语言的改进的可移植性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号