Improving Phonetic Recognition with Sequence-length Standardized MFCC Features and Deep Bi-Directional LSTM

机译：使用序列长度标准化MFCC功能和深度双向LSTM改善语音识别

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Phonetic recognition is one of the most challenging problems in the field of speech analysis. These applications can be mentioned such as dialect identification [1], mispronunciation detection [2], spoken document retrieval [3], and so on. There are different approaches to solve these problems such as improving the feature selection on input speech [4], applying deep learning technique [5] [6] [7] or combining both of them [8]. With the sequence data as the phonetics, the architecture which is based on recurrent neural network (RNN) is an appropriate approach [9]. It is even more powerful when combined with the improvement of features selection on input data. In our approach, we combine the Mel Frequency Cepstral Coefficients (MFCC) method with sequence-length to present the acoustic features of speech and use some RNN models to phonetic classification. Our experiments are implemented on the Texas Instruments Massachusetts Institute of Technology (TIMIT) [10] phone recognition dataset. Especially, our data processing and features selection method give consistently better results than other researches using the same neural network model. Currently, we have achieved the lowest error test rate (13.05%) by using Bidirectional LSTM, which is the best result in TIMIT dataset with the reduction of about 3.5% over the last best result [5] [6].

机译：语音识别是语音分析领域最具挑战性的问题之一。可以提及这些应用程序，例如方言识别[1]，发音错误[2]，语音文档检索[3]等。解决这些问题的方法有很多，例如改进输入语音的特征选择[4]，应用深度学习技术[5] [6] [7]或将两者结合[8]。以序列数据作为语音，基于递归神经网络（RNN）的体系结构是一种合适的方法[9]。当结合改进输入数据的特征选择时，它甚至更强大。在我们的方法中，我们将梅尔频率倒谱系数（MFCC）方法与序列长度相结合，以呈现语音的声学特征，并使用一些RNN模型进行语音分类。我们的实验是在德州仪器麻省理工学院（TIMIT）[10]电话识别数据集上实现的。特别是，与使用相同神经网络模型的其他研究相比，我们的数据处理和特征选择方法始终提供更好的结果。目前，我们通过使用双向LSTM达到了最低的错误测试率（13.05％），这是TIMIT数据集中的最佳结果，比最后的最佳结果减少了约3.5％[5] [6]。

著录项

来源
《NAFOSTED Conference on Information and Computer Science》|2018年|322-325|共4页
会议地点
作者
Toan Pham Van; Hau Nguyen Thanh; Ta Minh Thanh;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Speech recognition; Mel frequency cepstral coefficient; Phonetics; Feature extraction; Recurrent neural networks; Computer science;

机译：训练;语音识别;梅尔倒谱系数;语音;特征提取;递归神经网络;计算机科学;
入库时间 2022-08-26 13:50:19

相似文献

外文文献
中文文献
专利

1. DB-LSTM: Densely-connected Bi-directional LSTM for human action recognition [J] . He Jun-Yan, Wu Xiao, Cheng Zhi-Qi, Neurocomputing . 2021,第Jula15期

机译：DB-LSTM：用于人类行动识别的密集连接双向LSTM
2. A low latency modular-level deeply integrated MFCC feature extraction architecture for speech recognition [J] . Paul Bibin Sam S., Glittas Antony Xavier, Gopalakrishnan Lakshminarayanan Integration . 2021,第Jana期

机译：低延迟模块级深度集成的MFCC功能提取架构进行语音识别
3. Deep feature extraction technique based on Conv1D and LSTM network for food image recognition [J] . Sirawan Phiphitphatphaisit, Olarik Surinta Engineering and Applied Science Research . 2021,第5期

机译：基于Conv1d和LSTM网络的食物图像识别的深度特征提取技术
4. Improving Phonetic Recognition with Sequence-length Standardized MFCC Features and Deep Bi-Directional LSTM [C] . Toan Pham Van, Hau Nguyen Thanh, Ta Minh Thanh NAFOSTED Conference on Information and Computer Science . 2018

机译：用序列长度标准化MFCC特征和深双向LSTM提高语音识别
5. Speech recognition based on phonetic features and acoustic landmarks. [D] . Juneja, Amit. 2004

机译：基于语音特征和声学界标的语音识别。
6. Deep learning for named entity recognition on Chinese electronic medical records: Combining deep transfer learning with multitask bi-directional LSTM RNN [O] . Xishuang Dong, Shanta Chowdhury, Lijun Qian, 2015

机译：深度学习用于中国电子病历中的命名实体识别：将深度迁移学习与多任务双向LSTM RNN相结合
7. Improved language recognition using better phonetic decoders and fusion with MFCC and SDC features [O] . Toledano, Doroteo T., González Domínguez, Javier, Abejón González, Alejandro, 2007

机译：使用更好的语音解码器并与mFCC和sDC功能融合，改进语言识别

Improving Phonetic Recognition with Sequence-length Standardized MFCC Features and Deep Bi-Directional LSTM

摘要

著录项

相似文献

相关主题

期刊订阅