Hybrid HMM-BLSTM-Based Acoustic Modeling for Automatic Speech Recognition on Quran Recitation

机译：基于混合HMM-BLSTM的古兰经朗读自动语音识别模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Nowadays, there are many software applications which assist people to access Quran with their own device. Some of those applications are completed by feature to recognize Quran recitation from the user as well. Therefore, capability of the application to recognize Quran recitation is attracting to be observed. Automatic Speech Recognition (ASR)on Quran recitation is a new research for the past years, compared to English or other spoken languages. For some research, Hidden Markov Model (HMM)- Gaussian Mixture Model (GMM)is still popular to be utilized in acoustic modeling. However, HMM-GMM has a disadvantage in generalizing high-variance data. There is also a problem in solving non-linearly separable data. To tackle those problems, a new method to train the acoustic model for Quran speech recognition with deep learning approach was proposed in this paper. Bidirectional Long-Short Term Memory (BLSTM)as one of deep learning topologies was used in the experiment. This topology was combined with HMM as a hybrid system. In some research, this method had worked well for another language e.g. English speech recognition. In general, the research result showed that this method was also working greatly to Quran speech recognition compared to our baseline system with HMM-GMM. For baseline models, the average result of WER was 18.39%. On the other hand, our experimental model (acoustic model with Hybrid HMM-BLSTM)showed a far better result, with average WER value 4.63% for the same testing scenario. In this research also, Quran recitation style effect was also analyzed by building the model which depended on Quran recitation style (Maqam).

机译：如今，有许多软件应用程序可以帮助人们使用自己的设备访问《古兰经》。这些应用程序中的某些应用程序还具有识别用户的古兰经背诵的功能。因此，吸引人们注意该应用程序识别古兰经背诵的能力。与英语或其他口头语言相比，古兰经背诵的自动语音识别（ASR）是过去几年的一项新研究。对于某些研究，隐马尔可夫模型（HMM）-高斯混合模型（GMM）仍很流行，可用于声学建模。然而，HMM-GMM在归纳高方差数据方面具有缺点。在求解非线性可分离数据时也存在问题。针对这些问题，本文提出了一种通过深度学习方法训练古兰经语音识别声学模型的新方法。实验中使用双向长期学习记忆（BLSTM）作为深度学习拓扑之一。该拓扑与HMM组合为混合系统。在某些研究中，这种方法适用于另一种语言，例如英语语音识别。总体而言，研究结果表明，与我们使用HMM-GMM的基准系统相比，该方法对古兰经语音识别也有很大的帮助。对于基准模型，WER的平均结果为18.39％。另一方面，我们的实验模型（带有Hybrid HMM-BLSTM的声学模型）显示出更好的结果，在相同的测试场景下，平均WER值为4.63％。在本研究中，还通过建立依赖于古兰经背诵风格（Maqam）的模型来分析古兰经背诵风格效果。

著录项

来源
《International conference on Asian language processing》|2018年|203-208|共6页
会议地点
作者
Faza Thirafi; Dessi Puji Lestari;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Hidden Markov models; Acoustics; Speech recognition; Deep learning; Data models; Computer architecture; Logic gates;

机译：隐马尔可夫模型;声学;语音识别;深度学习;数据模型;计算机体系结构;逻辑门;

相似文献

外文文献
中文文献
专利

1. Acoustic landmarks contain more information about the phone string than other frames for automatic speech recognition with deep neural network acoustic model [J] . He Di, Lim Boon Pang, Yang Xuesong, The Journal of the Acoustical Society of America . 2018,第6aPta1期

机译：声学地标包含与具有深度神经网络声学模型的自动语音识别的其他帧的更多信息
2. Acoustic landmarks contain more information about the phone string than other frames for automatic speech recognition with deep neural network acoustic model [J] . He Di, Lim Boon Pang, Yang Xuesong, The Journal of the Acoustical Society of America . 2018,第6aPta2期

机译：声学地标包含与具有深度神经网络声学模型的自动语音识别的其他帧的更多信息
3. Enhancements in automatic Kannada speech recognition system by background noise elimination and alternate acoustic modelling [J] . G. Thimmaraja Yadava, H. S. Jayanna International journal of speech technology . 2020,第1期

机译：通过背景噪声消除和替代声学建模增强了自动Kannada语音识别系统
4. Hybrid HMM-BLSTM-Based Acoustic Modeling for Automatic Speech Recognition on Quran Recitation [C] . Faza Thirafi, Dessi Puji Lestari International Conference on Asian Language Processing . 2018

机译：基于混合HMM-BLSTM的自动语音识别对古兰经朗诵的声学建模
5. Graph-based Semi-Supervised Learning in Acoustic Modeling for Automatic Speech Recognition. [D] . Liu, Yuzong. 2016

机译：用于自动语音识别的声学建模中基于图的半监督学习。
6. Automatic speech recognition using articulatory features from subject-independent acoustic-to-articulatory inversion [O] . Prasanta Kumar Ghosh, Shrikanth Narayanan -1

机译：使用从独立于受试者的声学到发音反转的发音特征进行自动语音识别
7. Acoustic Model Merging Using Acoustic Models from Multilingual Speakers for Automatic Speech Recognition [O] . Tien-ping Tan, Laurent Besacier, Benjamin Lecouteux 2015

机译：声学模型融合使用多语言扬声器的声学模型进行自动语音识别

Hybrid HMM-BLSTM-Based Acoustic Modeling for Automatic Speech Recognition on Quran Recitation

摘要

著录项

相似文献

相关主题

期刊订阅