Word-Level Permutation and Improved Lower Frame Rate for RNN-Based Acoustic Modeling

机译：基于RNN的声学建模的字级置换和改进的较低帧速率

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, the RNN-based acoustic model has shown promising performance. However, its generalization ability to multiple scenarios is not powerful enough for two reasons. Firstly, it encodes inter-word dependency, which conflicts with the nature that an acoustic model should model the pronunciation of words only. Secondly, the RNN-based acoustic model depicting the inner-word acoustic trajectory frame-by-frame is too precise to tolerate small distortions. In this work, we propose two variants to address aforementioned two problems. One is the word-level permutation, i.e. the order of input features and corresponding labels is shuffled with a proper probability according to word boundaries. It aims to eliminate inter-word dependencies. The other one is the improved LFR (iLFR) model, which equidistantly splits the original sentence into N utterances to overcome the discarding data in LFR model. Results based on LSTM RNN demonstrate 7% relative performance improvement by jointing the word-level permutation and iLFR.

机译：最近，基于RNN的声学模型表现出了很有希望的性能。然而，由于两个原因，它对多种情况的泛化能力不够强大。首先，它编码单词依赖性，这与声学模型应该仅对单词的发音模型的性质冲突。其次，描绘了基于RNN的声学模型，描绘了内部字声学轨迹框架的逐帧太精确以容忍小扭曲。在这项工作中，我们提出了两种变体来解决前述两个问题。一个是单词级别置换，即输入特征和相应标签的顺序根据Word边界以适当的概率进行洗牌。它旨在消除词交际依赖关系。另一个是改进的LFR（ILFR）模型，其等于将原始句子分裂成N个话语，以克服LFR模型中的丢弃数据。基于LSTM RNN的结果通过联系单词级排列和ILFR显示了7％的相对性能改善。

著录项

来源
《International Conference on Neural Information Processing》|2017年|912p|共11页
会议地点
作者
Yuanyuan Zhao; Shiyu Zhou; Shuang Xu; Bo Xu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP183-53;
关键词
RNN-based acoustic model; Acoustic trajectory; Lower frame rate; Word-level permutation;

机译：基于RNN的声学模型;声学轨迹;较低的帧速率;字级排列;

相似文献

外文文献
中文文献
专利

1. An Improved Framework for Recognizing Highly Imbalanced Bilingual Code-Switched Lectures with Cross-Language Acoustic Modeling and Frame-Level Language Identification [J] . Yeh Ching-Feng, Lee Lin-Shan Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2015,第7期

机译：跨语言声学建模和框架级语言识别的高度识别双语代码转换演讲的改进框架
2. Assessment of detailed conformations suggests strategies for improving cryoEM models: Helix at lower resolution, ensembles, pre-refinement fixups, and validation at multi-residue length scale [J] . Richardson Jane S., Williams Christopher J., Videau Lizbeth L., Journal of Structural Biology . 2018,第2期

机译：详细构造的评估表明，改进了Cryoem模型的策略：在较低分辨率下的螺旋，集成，精炼精确定影和多残留长度尺度的验证
3. Improved hidden Markov model adaptation method for reduced frame rate speech recognition [J] . L.M. Lee, H.H. Le, F.R. Jean Electronics Letters . 2017,第14期

机译：降低帧频语音识别的改进隐马尔可夫模型自适应方法
4. Word-Level Permutation and Improved Lower Frame Rate for RNN-Based Acoustic Modeling [C] . Yuanyuan Zhao, Shiyu Zhou, Shuang Xu, International conference on neural information processing . 2017

机译：基于RNN的声学建模的单词级排列和改进的较低帧速率
5. Ecologically-framed Mercury Database, Exposure Modeling and Risk/Benefit Communication to Lower Chesapeake Bay Fish Consumers [D] . Xu, Xiaoyu 2013

机译：生态构架的汞数据库，与切萨皮克湾下游鱼类消费者的暴露模型和风险/收益交流
6. Improved model adaptation approach for recognition of reduced-frame-rate continuous speech [O] . Lee-Min Lee, Hoang-Hiep Le, Fu-Rong Jean -1

机译：用于识别降低帧率的连续语音的改进模型自适应方法
7. Improved Lower Bounds for Permutation Arrays Using Permutation Rational Functions [O] . Sergey Bereg, Brian Malouf, Linda Morales, 2021

机译：使用置换rational函数改进置换阵列的下限

Word-Level Permutation and Improved Lower Frame Rate for RNN-Based Acoustic Modeling

摘要

著录项

相似文献

相关主题

期刊订阅