Exploring end-to-end framework towards Khasi speech recognition system

Syiem Bronson; Singh L. Joyprakash

首页> 外文期刊>International journal of speech technology >Exploring end-to-end framework towards Khasi speech recognition system

【24h】

Exploring end-to-end framework towards Khasi speech recognition system

机译：探索响起Khasi语音识别系统的端到端框架

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Building a conventional automatic speech recognition (ASR) system based on hidden Markov model (HMM)/deep neural network (DNN) makes the system complex as it requires various modules such as acoustic, lexicon, linguistic resources, language models etc. particularly with the low resource languages. In contrast, End-to-End architecture has greatly simplifies the model building process by representing complex modules with a simple deep network and by replacing the use of linguistic resources with a data-driven learning techniques. In this paper, we present our prior work by exploring End-to-End (E2E) framework for Khasi speech recognition system and the novel extension towards the development of speech corpora for standard Khasi dialect. We implemented the proposed E2E model by using Nabu ASR toolkit. Additionally, three other models (monophone, triphone and hybrid DNN) were built. Comparing the results, significant improvement was achieved using the proposed method particularly with the connectionist temporal classification (CTC) with a character error rate (CER) of 5.04%.

机译：构建基于隐马尔可夫模型（HMM）/深神经网络（DNN）的传统自动语音识别（ASR）系统使系统复杂，因为它需要各种模块，例如声学，词汇，语言资源，语言模型等。低资源语言。相比之下，端到端架构通过表示具有简单深度网络的复杂模块以及用数据驱动的学习技术代替使用语言资源来极大地简化了模型构建过程。在本文中，我们通过探索Khasi语音识别系统的端到端（E2E）框架以及为标准KHASI方言开发语音集团开发的新颖延伸，展示了我们的先前工作。我们使用Nabu ASR Toolkit实现了所提出的E2E模型。此外，建立了三种其他型号（唯一的模型（单声道，三灯和混合DNN）。比较结果，使用所提出的方法实现了显着的改进，特别是具有5.04％的字符误差率（CER）的连接员时间分类（CTC）。

著录项

来源
《International journal of speech technology》 |2021年第2期|419-424|共6页
作者
Syiem Bronson; Singh L. Joyprakash;
展开▼
作者单位

NEHU Elect & Commun Engn Shillong 793022 Meghalaya India;

NEHU Elect & Commun Engn Shillong 793022 Meghalaya India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Automatic speech recognition; Deep neural network; End-to-End; Hidden Markov model;

机译：自动语音识别;深神经网络;端到端;隐藏马尔可夫模型;
入库时间 2022-08-19 02:16:47

相似文献

外文文献
中文文献
专利

1. Bridging automatic speech recognition and psycholinguistics: Extending Shortlist to an end-to-end model of human speech recognition (L) [J] . Odette Scharenborg, Louis ten Bosch, Lou Boves, The Journal of the Acoustical Society of America . 2003,第6期

机译：桥接自动语音识别和心理语言学：将候选清单扩展到人类语音识别的端到端模型（L）
2. Arabic speech recognition by end-to-end, modular systems and human [J] . Amir Hussein, Shinji Watanabe, Ahmed Ali Computer speech and language . 2022,第Jana期

机译：以端到端，模块化系统和人类的阿拉伯语语音识别
3. End-to-End Audiovisual Speech Recognition System With Multitask Learning [J] . Fei Tao, Carlos Busso Multimedia, IEEE Transactions on . 2021,第1期

机译：具有多任务学习的端到端视听语音识别系统
4. Joint Training End-to-End Systems for Speech Recognition and Speech Recognition with Speaker Attributes [C] . Sheng Li, Xugang Lu, Raj Dabre, æ—¥æœ¬éŸ³éŸ¿å¦ä¼š;æ—¥æœ¬éŸ³éŸ¿å¦ä¼šç ”ç©¶ç™ºè¡¨ä¼š . 2020

机译：联合培训用于语音识别和语音识别的结尾系统，与扬声器属性
5. A computational framework for exploring the role of speech production in speech processing from a communication system perspective. [D] . Ghosh, Prasanta Kumar. 2011

机译：从通信系统的角度探讨语音生成在语音处理中的作用的计算框架。
6. Dynamic Acoustic Unit Augmentation with BPE-Dropout for Low-Resource End-to-End Speech Recognition [O] . Aleksandr Laptev, Andrei Andrusenko, Ivan Podluzhny, 2021

机译：用BPE-ropout进行动态声学单元增强用于低资源端到端语音识别
7. End-to-End Training of a Large Vocabulary End-to-End Speech Recognition System [O] . Chanwoo Kim, Sungsoo Kim, Kwangyoun Kim, 2019

机译：大型词汇端到端语音识别系统的端到端培训

Exploring end-to-end framework towards Khasi speech recognition system

摘要

著录项

相似文献

相关主题

期刊订阅