首页> 外文期刊>International journal of speech technology >Exploring end-to-end framework towards Khasi speech recognition system
【24h】

Exploring end-to-end framework towards Khasi speech recognition system

机译:探索响起Khasi语音识别系统的端到端框架

获取原文
获取原文并翻译 | 示例
       

摘要

Building a conventional automatic speech recognition (ASR) system based on hidden Markov model (HMM)/deep neural network (DNN) makes the system complex as it requires various modules such as acoustic, lexicon, linguistic resources, language models etc. particularly with the low resource languages. In contrast, End-to-End architecture has greatly simplifies the model building process by representing complex modules with a simple deep network and by replacing the use of linguistic resources with a data-driven learning techniques. In this paper, we present our prior work by exploring End-to-End (E2E) framework for Khasi speech recognition system and the novel extension towards the development of speech corpora for standard Khasi dialect. We implemented the proposed E2E model by using Nabu ASR toolkit. Additionally, three other models (monophone, triphone and hybrid DNN) were built. Comparing the results, significant improvement was achieved using the proposed method particularly with the connectionist temporal classification (CTC) with a character error rate (CER) of 5.04%.
机译:构建基于隐马尔可夫模型(HMM)/深神经网络(DNN)的传统自动语音识别(ASR)系统使系统复杂,因为它需要各种模块,例如声学,词汇,语言资源,语言模型等。低资源语言。相比之下,端到端架构通过表示具有简单深度网络的复杂模块以及用数据驱动的学习技术代替使用语言资源来极大地简化了模型构建过程。在本文中,我们通过探索Khasi语音识别系统的端到端(E2E)框架以及为标准KHASI方言开发语音集团开发的新颖延伸,展示了我们的先前工作。我们使用Nabu ASR Toolkit实现了所提出的E2E模型。此外,建立了三种其他型号(唯一的模型(单声道,三灯和混合DNN)。比较结果,使用所提出的方法实现了显着的改进,特别是具有5.04%的字符误差率(CER)的连接员时间分类(CTC)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号