Automatic Long Audio Alignment and Confidence Scoring for Conversational Arabic Speech

机译：会话阿拉伯语音的自动长音频对齐和置信度评分

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this paper, a framework for long audio alignment for conversational Arabic speech is proposed. Accurate alignments help in many speech processing tasks such as audio indexing, speech recognizer acoustic model (AM) training, audio summarizing and retrieving, etc. We have collected more than 1,400 hours of conversational Arabic besides the corresponding human generated non-aligned transcriptions. Automatic audio segmentation is performed using a split and merge approach. A biased language model (LM) is trained using the corresponding text after a pre-processing stage. Because of the dominance of non-standard Arabic in conversational speech, a graphemic pronunciation model (PM) is utilized. The proposed alignment approach is performed in two passes. Firstly, a generic standard Arabic AM is used along with the biased LM and the graphemic PM in a fast speech recognition pass. In a second pass, a more restricted LM is generated for each audio segment, and unsupervised acoustic model adaptation is applied. The recognizer output is aligned with the processed transcriptions using Levenshtein algorithm. The proposed approach resulted in an initial alignment accuracy of 97.8-99.0% depending on the amount of disfluencies. A confidence scoring metric is proposed to accept/reject aligner output. Using confidence scores, it was possible to reject the majority of mis-aligned segments resulting in alignment accuracy of 99.0-99.8% depending on the speech domain and the amount of disfluencies.

机译：在本文中，提出了一种用于阿拉伯语会话语音的长音频对齐的框架。准确的对齐方式可以帮助完成许多语音处理任务，例如音频索引，语音识别器声学模型（AM）训练，音频摘要和检索等。我们已经收集了1400多个会话阿拉伯语，除了相应的人类生成的非对齐转录外。使用拆分和合并方法执行自动音频分段。在预处理阶段之后，使用相应的文本来训练有偏语言模型（LM）。由于非标准阿拉伯语在会话语音中占主导地位，因此使用了音素发音模型（PM）。提议的对齐方法分两步执行。首先，在快速语音识别过程中，将通用标准阿拉伯语AM与偏向LM和字素PM一起使用。在第二遍中，为每个音频片段生成一个更严格的LM，并应用无监督的声学模型自适应。使用Levenshtein算法将识别器的输出与已处理的转录对齐。所提出的方法根据不同的废液量，其初始对准精度为97.8-99.0％。建议采用置信度评分标准来接受/拒绝对齐器输出。使用置信度评分，可以拒绝大多数未对齐的片段，从而根据语音域和疏散程度，导致对齐精度为99.0-99.8％。

著录项

来源
《9th International conference on language resources and evaluation》|2014年|2268-2272|共5页
会议地点
作者
Mohamed Elmahdy; Mark Hasegawa-Johnson; Eiman Mustafawi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
conversational Arabic; audio alignment; speech processing;

机译：会话阿拉伯语;音频对齐;语音处理;

相似文献

外文文献
中文文献
专利

1. Investigation of Automatic Speech Recognition Performance and Mean Opinion Scores for Different Standard Speech and Audio Codecs [J] . A. V. Ramana, Laxminarayana Parayitam, Mythili Sharan Pala IETE Journal of Research . 2012,第2期

机译：不同标准语音和音频编解码器的自动语音识别性能和平均意见得分的调查
2. Pronunciation change in conversational speech and its implications for automatic speech recognition [J] . Murat Sarclar, Sanjeev Khudanpur Computer speech and language . 2004,第4期

机译：会话语音中的语音变化及其对自动语音识别的影响
3. Morphology-based language modeling for conversational Arabic speech recognition [J] . Katrin Kirchhoff, Dimitra Vergyri, Jeff Bilmes, Computer speech and language . 2006,第4期

机译：基于形态学的语言模型用于会话阿拉伯语音识别
4. Automatic Long Audio Alignment and Confidence Scoring for Conversational Arabic Speech [C] . Mohamed Elmahdy, Mark Hasegawa-Johnson, Eiman Mustafawi 9th International conference on language resources and evaluation . 2014

机译：用于对话阿拉伯语演讲的自动长音响对齐和信心评分
5. Conversational Speech Understanding in Highly Naturalistic Audio Streams [D] . Kaushik, Lakshmish. 2018

机译：高度自然主义的音频流中的会话语音理解
6. A systematic comparison of contemporary automatic speech recognition engines for conversational clinical speech [O] . Jodi Kodish-Wachs, Emin Agassi, Patrick Kenny III, 2018

机译：当代自动语音识别引擎用于对话式临床语音的系统比较
7. The effects of speakers' gender, age, and region on overall performance of Arabic automatic speech recognition systems using the phonetically rich and balanced Modern Standard Arabic speech corpus [O] . Sawalha M, Abu Shariah M 2013

机译：发言者的性别，年龄和地区对使用语音丰富和平衡的现代标准阿拉伯语言语料库的阿拉伯语自动语音识别系统整体表现的影响

Automatic Long Audio Alignment and Confidence Scoring for Conversational Arabic Speech

摘要

著录项

相似文献

相关主题

期刊订阅