首页> 外文期刊>Acoustical science and technology >An encoder-decoder based grapheme-to-phoneme converter for Bangla speech synthesis
【24h】

An encoder-decoder based grapheme-to-phoneme converter for Bangla speech synthesis

机译:基于编码器 - 解码器的GANGLA语音合成的图形到音素转换器

获取原文
           

摘要

This paper proposes an encoder-decoder based sequence-to-sequence model for Grapheme-to-Phoneme (G2P) conversion in Bangla (Exonym: Bengali). G2P models are key components in speech recognition and speech synthesis systems as they describe how words are pronounced. Traditional, rule-based models do not perform well in unseen contexts. We propose to adopt a neural machine translation (NMT) model to solve the G2P problem. We used gated recurrent units (GRU) recurrent neural network (RNN) to build our model. In contrast to joint-sequence based G2P models, our encoder-decoder based model has the flexibility of not requiring explicit grapheme-to-phoneme alignment which are not straight forward to perform. We trained our model on a pronunciation dictionary of (approximately) 135,000 entries and obtained a word error rate (WER) of 12.49% which is a significant improvement from the existing rule-based and machine-learning based Bangla G2P models.
机译:本文提出了一种基于编码器 - 解码器基于孟加拉的标记到音素(G2P)转换的序列到序列模型(例如,孟加拉语)。 G2P模型是语音识别和语音合成系统中的关键组件,因为它们描述了如何发音。传统的基于规则的模型在看不见的背景下不表现良好。我们建议采用神经机翻译(NMT)模型来解决G2P问题。我们使用Gated经常性单位(GRU)经常性神经网络(RNN)来构建我们的模型。与基于联合序列的G2P模型相比,基于编码器 - 解码器的模型具有不需要明确的图形到音素对齐的灵活性,该对准是不直接执行的。我们在(大约)135,000条目的发音词典上培训了我们的模型,并获得了12.49%的字错误率(WER),这是基于规则的基于规则和机器学习的Bangla G2P模型的重大改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号