Effects of acoustic mismatches on speech recognition accuracies due to playback-recorded speech corpus

机译：声音不匹配对回放记录的语音语料库造成的语音识别准确性的影响

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern speech recognition techniques rely on large amount of speech data whose acoustic characteristics match with the operating environments to train their acoustic models. Gathering training data from loudspeakers playing recorded speech utterances are far more practical than from human speakers. This paper presents results from speech recognition experiments providing practical insights on effects caused by utterances re-recorded form loudspeakers. A clean-speech corpus of sixty human speakers was built using two different microphones and their playbacks were re-recorded. Results show that, with minimal lexical constraints, accuracies degraded for playback-trained system, even with no mismatches between training and test data. However, mismatches did not affect cases with tighter high-level constraints, such as number and limited-vocabulary word recognitions. A procedure to reduce mismatches caused by constructing corpus from playbacks was introduced. The procedure was shown to make the accuracy of a playback-trained system 48% closer to the one of the system trained with speech in matched environment.

机译：现代语音识别技术依赖于其声学特性与操作环境相匹配的大量语音数据来训练其声学模型。从播放录制的语音的扬声器中收集训练数据比从人类扬声器中收集训练数据要实用得多。本文介绍了语音识别实验的结果，提供了对扬声器重新录制的话语所造成的影响的实用见解。使用两个不同的麦克风构建了一个由60位人类说话者组成的清晰语音的语料库，并重新录制了其回放内容。结果表明，在最小的词法约束下，即使在训练和测试数据之间没有不匹配的情况下，对于经过回放训练的系统，准确性也会下降。但是，不匹配不会影响具有更严格的高级约束的案件，例如数量和词汇限制词识别。介绍了减少因播放而构造语料库而导致的不匹配的过程。结果表明，该程序可使经过回放训练的系统的准确度比在匹配环境中经过语音训练的系统的准确度高48％。

著录项

来源
《Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON), 2012 9th International Conference on》|2012年|p.1- 4|共4页
会议地点 Phetchaburi(TH)
作者
Suchato Atiwong; Chanjaradwichai Supadaech; Kertkeidkachorn Natthawut; Vorapatratorn Surapol; Hirankan Pawanrat; Suri Teera; Likitsupin Krerksak; Chuetanapinyo Supakit; Punyabukkana Proadpran;
展开▼
作者单位

Spoken Language Systems Research Group, Department of Computer Engineering, Faculty of Engineering, Chulalongkorn University, Bangkok, Thailand;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类通信;计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Speech corpus recycling for acoustic cross-domain environments for automatic speech recognition [J] . Daniel Willett, Osamu Ichikawa, Steven J. Rennie, Acoustical science and technology . 2016,第2期

机译：用于声学跨域环境的语音语料库回收以实现自动语音识别
2. Differences between acoustic characteristics of spontaneous and read speech and their effects on speech recognition performance [J] . Masanobu Nakamura, Koji Iwano, Sadaoki Furui Computer speech and language . 2008,第2期

机译：自发和阅读语音的声学特性之间的差异及其对语音识别性能的影响
3. Corpus of deaf speech for acoustic and speech production research [J] . Mendel Lisa Lucks, Lee Sungmin, Pousson Monique, The Journal of the Acoustical Society of America . 2017,第1期

机译：声音和语音生产研究的聋哑语音
4. Effects of acoustic mismatches on speech recognition accuracies due to playback-recorded speech corpus [C] . Suchato Atiwong, Chanjaradwichai Supadaech, Kertkeidkachorn Natthawut, International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology . 2012

机译：声音不匹配对播放录制语音语料库引起的语音识别准确性的影响
5. Strategies for improving audible quality and speech recognition accuracy of reverberant speech. [D] . Gillespie, Bradford Wilson. 2002

机译：改善混响语音的听觉质量和语音识别准确性的策略。
6. Words from spontaneous conversational speech can be recognized with human-like accuracy by an error-driven learning algorithm that discriminates between meanings straight from smart acoustic features bypassing the phoneme as recognition unit [O] . Denis Arnold, Fabian Tomaschek, Konstantin Sering, -1

机译：通过错误驱动的学习算法可以区分自发会话语音中的单词其准确性与人类类似可以从智能声学特征中区分出含义而绕过音素作为识别单元
7. The effects of speakers' gender, age, and region on overall performance of Arabic automatic speech recognition systems using the phonetically rich and balanced Modern Standard Arabic speech corpus [O] . Sawalha M, Abu Shariah M 2013

机译：发言者的性别，年龄和地区对使用语音丰富和平衡的现代标准阿拉伯语言语料库的阿拉伯语自动语音识别系统整体表现的影响
8. DARPA TIMIT Acoustic-Phonetic Continous Speech Corpus CD-ROM. NIST Speech Disc 1-1.1 [R] . Garofolo, J. S., Lamel, L. F., Fisher, W. M., 1993

机译：DaRpa TImIT acoustic-phonetic连续语音语料库CD-ROm。 NIsT语音盘1-1.1

Effects of acoustic mismatches on speech recognition accuracies due to playback-recorded speech corpus

摘要

著录项

相似文献

相关主题

期刊订阅