首页> 外文会议>Annual Conference of the International Speech Communication Association >Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages
【24h】

Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages

机译:通过借用高资源语言借用数据和模型参数来克服低资源语言的声学建模的数据稀疏性

获取原文

摘要

In this paper, we propose two techniques to improve the acoustic model of a low-resource language by: (i) Pooling data from closely related languages using a phoneme mapping algorithm to build acoustic models like subspace Gaussian mixture model (SGMM), phone cluster adaptive training (Phone-CAT), deep neural network (DNN) and convolutional neural network (CNN). Using the low-resource language data, we then adapt the afore mentioned models towards that language. (ii) Using models built from high-resource languages, we first borrow subspace model parameters from SGMM/Phone-CAT; or hidden layers from DNN/CNN. The language specific parameters are then estimated using the low-resource language data. The experiments were performed on four Indian languages namely Assamese, Bengali, Hindi and Tamil. Relative improvements of 10 to 30% were obtained over corresponding monolingual models in each case.
机译:在本文中,我们提出了两种技术来改进低资源语言的声学模型:(i)使用音素映射算法从密切相关的语言中汇集数据来构建子空间高斯混合模型(SGMM),手机群集的声学模型自适应培训(电话猫),深神经网络(DNN)和卷积神经网络(CNN)。使用低资源语言数据,我们将上述模型调整为此语言。 (ii)使用从高资源语言构建的模型,我们首先从SGMM /电话猫借用子空间模型参数;或来自DNN / CNN的隐藏层。然后使用低资源语言数据估计语言特定参数。实验是在四种印度语言中进行的,即Assamese,Bengali,印地语和泰米尔。在每种情况下,在相应的单晶模型中获得10至30%的相对改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号