首页> 美国卫生研究院文献>SpringerPlus >Heterophonic speech recognition using composite phones
【2h】

Heterophonic speech recognition using composite phones

机译:使用复合电话的异质语音识别

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Heterophones pose challenges during training of automatic speech recognition (ASR) systems because they involve ambiguity in the pronunciation of an orthographic representation of a word. Heterophones are words that have the same spelling but different pronunciations. This paper addresses the problem of heterophonic languages by developing the concept of a Composite Phoneme (CP) as a basic pronunciation unit for speech recognition. A CP is a set of alternative sequences of phonemes. CP’s are developed specifically in the context of Arabic by defining phonetic units that are consonant centric and absorb phonemically contrastive short vowels and gemination, not represented in the Arabic Modern Orthography (MO). CPs alleviate the need to diacritize MO into Classical Orthography (CO), to represent short vowels and stress, before generating pronunciation in terms of Simple Phonemes (SP). We develop algorithms to generate CP pronunciation from MO, and SP pronunciation from CO to map a word into a single pronunciation. We investigate the performance of CP, SP, UG (Undiacritized Grapheme), and DG (Diacritized Grapheme) ASRs. The experimental results suggest that UG and DG are inferior to SP and CP. For the A-SpeechDB corpus with MO vocabulary of 8000, the WER for bigram and context dependent phone are: 11.78, 12.64, and 13.59 % for CP, SP_M (SP from manual diacritized CO), and SP_A (SP from automated diacritized MO) respectively. For vocabulary of 24,000 MO words, the corresponding WER’s are 13.69, 15.08, and 16.86 %. For uniform statistical model, SP has a lower WER than CP. For context independent phone (CI), CP has lower WER than SP.
机译:杂音机在训练自动语音识别(ASR)系统时提出了挑战,因为它们在单词的正字表示法的发音中涉及歧义。杂音字母是具有相同拼写但发音不同的单词。本文通过发展复合音素(CP)概念作为语音识别的基本发音单元,解决了异音语言的问题。 CP是一组音素的替代序列。 CP是在阿拉伯语环境中专门开发的,它定义了以辅音为中心并吸收语音对比短元音和成语的语音单位,这在阿拉伯现代拼字法(MO)中没有体现。 CP减轻了将MO简化为古典拼字法(CO)的需求,以表示短元音和重音,然后再生成简单音素(SP)的发音。我们开发了从MO生成CP发音和从CO生成SP发音的算法,以将一个单词映射为单个发音。我们研究了CP,SP,UG(不透磁字素)和DG(双敏字素)ASR的性能。实验结果表明,UG和DG均不如SP和CP。对于MO词汇量为8000的A-SpeechDB语料库,针对bigram和上下文相关电话的WER为:CP,SP_M(手动双歧化CO的SP)和SP_A(自动双歧化MO的SP)的11.78%,12.64和13.59%分别。对于24,000个MO单词的词汇量,相应的WER为13.69%,15.08和16.86%。对于统一的统计模型,SP的WER低于CP。对于上下文无关的电话(CI),CP的WER低于SP。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号