首页> 外文会议>International Conference on Neural Information Processing >Word-Level Permutation and Improved Lower Frame Rate for RNN-Based Acoustic Modeling
【24h】

Word-Level Permutation and Improved Lower Frame Rate for RNN-Based Acoustic Modeling

机译:基于RNN的声学建模的字级置换和改进的较低帧速率

获取原文

摘要

Recently, the RNN-based acoustic model has shown promising performance. However, its generalization ability to multiple scenarios is not powerful enough for two reasons. Firstly, it encodes inter-word dependency, which conflicts with the nature that an acoustic model should model the pronunciation of words only. Secondly, the RNN-based acoustic model depicting the inner-word acoustic trajectory frame-by-frame is too precise to tolerate small distortions. In this work, we propose two variants to address aforementioned two problems. One is the word-level permutation, i.e. the order of input features and corresponding labels is shuffled with a proper probability according to word boundaries. It aims to eliminate inter-word dependencies. The other one is the improved LFR (iLFR) model, which equidistantly splits the original sentence into N utterances to overcome the discarding data in LFR model. Results based on LSTM RNN demonstrate 7% relative performance improvement by jointing the word-level permutation and iLFR.
机译:最近,基于RNN的声学模型表现出了很有希望的性能。然而,由于两个原因,它对多种情况的泛化能力不够强大。首先,它编码单词依赖性,这与声学模型应该仅对单词的发音模型的性质冲突。其次,描绘了基于RNN的声学模型,描绘了内部字声学轨迹框架的逐帧太精确以容忍小扭曲。在这项工作中,我们提出了两种变体来解决前述两个问题。一个是单词级别置换,即输入特征和相应标签的顺序根据Word边界以适当的概率进行洗牌。它旨在消除词交际依赖关系。另一个是改进的LFR(ILFR)模型,其等于将原始句子分裂成N个话语,以克服LFR模型中的丢弃数据。基于LSTM RNN的结果通过联系单词级排列和ILFR显示了7%的相对性能改善。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号