Speech Enhancement Parameter Adjustment to Maximize Accuracy of Automatic Speech Recognition

Kawase Tomoko; Okamoto Manabu; Fukutomi Takaaki; Takahashi Yamato

首页> 外文期刊>IEEE Transactions on Consumer Electronics >Speech Enhancement Parameter Adjustment to Maximize Accuracy of Automatic Speech Recognition

【24h】

Speech Enhancement Parameter Adjustment to Maximize Accuracy of Automatic Speech Recognition

机译：语音增强参数调整，以最大限度地提高自动语音识别的准确性

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Consumer electronics equipped with a microphone array, such as car navigation devices and headsets commonly implement speech enhancement techniques based on the gradient method to cope with additive noise. However, while these techniques had been originally developed for voice communication and can maximize the signal-to-distortion ratio (SDR), they cannot always maximize automatic speech recognition (ASR) accuracy. For this reason, the front-end speech enhancement parameters have been adjusted by human experts to each environment and acoustic model. In this study, we developed a novel system for maximizing the accuracy of a given ASR engine by automatically adjusting the front-end speech enhancement. The proposed method allows consumers to use ASR through the consumer electronics with less stress when ambient noise varies. A genetic algorithm (GA) is used to generate parameter values of the front-end speech enhancement for particular environments. The generated values can be dynamically assigned to input speech signals by preliminarily clustering the environments based on noise features. In evaluations, parameter values determined by our method outperformed one adjusted by a human expert.

机译：消费电子设备配备有麦克风阵列，例如汽车导航设备和耳机通常基于梯度方法实现语音增强技术，以应对添加性噪声。但是，虽然这些技术最初用于语音通信，并且可以最大化信号到失真率（SDR），但它们不能总是最大化自动语音识别（ASR）精度。因此，人类专家对每个环境和声学模型进行了前端语音增强参数。在这项研究中，我们通过自动调整前端语音增强，开发了一种用于最大化给定ASR发动机的准确性的新系统。所提出的方法允许消费者在环境噪声变化时，消费者通过消费电子设备的压力较小。遗传算法（GA）用于生成特定环境前端语音增强的参数值。可以通过基于噪声特征预先聚类环境来动态地分配生成的值以输入语音信号。在评估中，我们的方法确定的参数值优于人类专家调整的参数值。

著录项

来源
《IEEE Transactions on Consumer Electronics》 |2020年第2期|125-133|共9页
作者
Kawase Tomoko; Okamoto Manabu; Fukutomi Takaaki; Takahashi Yamato;
展开▼
作者单位

NTT Corp Media Intelligence Labs Yokosuka Kanagawa 2390847 Japan;

NTT Corp Media Intelligence Labs Yokosuka Kanagawa 2390847 Japan|Sojo Univ Dept Comp & Informat Sci Kumamoto 8600082 Japan;

NTT Corp Media Intelligence Labs Yokosuka Kanagawa 2390847 Japan;

NTT Corp Media Intelligence Labs Yokosuka Kanagawa 2390847 Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Speech enhancement; Microphone arrays; Genetic algorithms; Adaptation models; Acoustics; Performance evaluation; Array signal processing; speech enhancement; automatic speech recognition; genetic algorithms; voice interface;

机译：语音增强;麦克风阵列;遗传算法;适应模型;声学;性能评估;阵列信号处理;语音增强;自动语音识别;遗传算法;语音界面;

相似文献

外文文献
中文文献
专利

1. Estimating hidden Markov model parameters so as to maximize speech recognition accuracy [J] . Bahl L.R., Brown P.F. IEEE Transactions on Speech and Audio Proceeding . 1993,第1期

机译：估计隐马尔可夫模型参数，以最大化语音识别精度
2. Auditory-Inspired Morphological Processing of Speech Spectrograms: Applications in Automatic Speech Recognition and Speech Enhancement [J] . Joyner Cadore, Francisco J. Valverde-Albacete, Ascensión Gallardo-Antolín, Cognitive Computation . 2013,第4期

机译：语音频谱图的听觉启发式形态处理：在自动语音识别和语音增强中的应用
3. Auditory-Inspired Morphological Processing of Speech Spectrograms: Applications in Automatic Speech Recognition and Speech Enhancement [J] . Joyner Cadore, Francisco J. Valverde-Albacete, Ascensión Gallardo-Antolín, Cognitive computation . 2013,第4期

机译：语音频谱图的听觉启发式形态处理：在自动语音识别和语音增强中的应用
4. Two-Stage Enhancement of Noisy and Reverberant Microphone Array Speech for Automatic Speech Recognition Systems Trained with Only Clean Speech [C] . Quandong Wang, Sicheng Wang, Fengpei Ge, International Symposium on Chinese Spoken Language Processing . 2018

机译：用于仅使用纯净语音训练的自动语音识别系统的噪声和混响麦克风阵列语音的两阶段增强
5. A multimodal fusion approach for automatic postal address recognition system using Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques. [D] . Singh, Amriteshwar. 2011

机译：一种使用光学字符识别（OCR）和自动语音识别（ASR）技术的自动邮政地址识别系统的多模式融合方法。
6. Brain-inspired speech segmentation for automatic speech recognition using the speech envelope as a temporal reference [O] . Byeongwook Lee, Kwang-Hyun Cho -1

机译：以语音包络作为时间参考的自动语音识别的大脑启发式语音分割
7. Auditory-inspired morphological processing of speech spectrograms: applications in automatic speech recognition and speech enhancement [O] . Cadore Joyner, Valverde-Albacete Francisco J., Gallardo-Antolín Ascensión, 2012

机译：听觉启发的语音频谱图形态处理：自动语音识别和语音增强中的应用
8. A Study of Significant Parameters of Speech for Application in Automatic Speech Recognition Systems [R] . 1964

机译：语音重要参数在自动语音识别系统中的应用研究

Speech Enhancement Parameter Adjustment to Maximize Accuracy of Automatic Speech Recognition

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅