Unsupervised Speaker Adaptation for Robust Speech Recognition in Real Environments

Shingo Yamade; Akira Baba; Shinichi Yoshikawa; Akinobu Lee; Hiroshi Saruwatari; Kiyohiro Shikano

首页> 外文期刊>Electronics and Communications in Japan. Part 2, Electronics >Unsupervised Speaker Adaptation for Robust Speech Recognition in Real Environments

【24h】

Unsupervised Speaker Adaptation for Robust Speech Recognition in Real Environments

机译：无监督说话人适应，可在真实环境中实现可靠的语音识别

获取原文

获取原文并翻译 | 示例

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In order to achieve high-precision speech recognition in real environments phone model adaptation procedures that can rapidly account for a wide range of different speakers and acoustic noise conditions are required. In this paper we propose an unsupervised speaker adaptation method that extends an unsupervised speaker and environment adaptation method based on sufficient statistics from HMMs by performing spectral subtraction and then adding a known noise to the input. Existing methods assume that a model is trained to match each of the different types of background noise that will be the object of recognition and do not consider variations in the signal-to-noise ratio or changes in the background noise for given inputs. In contrast, our method constrains the noise of the input data using an estimation of the noise spectra and then adds a known stable noise to the bleached noise that remains in the input, thereby smoothing out differences between background noises and enabling us to perform recognition with a single set of acoustic models. In addition, with regard to speaker adaptation, we select the set of closest speakers from our database on the basis of a single arbitrary utterance from the test speaker and retrain the acoustic models using the sufficient statistics of those speakers. By combining these two methods we are able to rapidly and accurately adapt to a new speaker. In recognition experiments with a signal-to-noise ratio of 20 dB and in a variety of noise conditions, the proposed method resulted in a recognition rate of 2 percent more than a speaker-independent model matched to the lest noise environment for each noise environment, achieving an average recognition performance of 85.1 percent overall. In addition, we conducted a comparison of our method with astandard supervised adaptation technique: maximum likelihood linear regression (MLLR).

机译：为了在真实环境中实现高精度的语音识别，需要能够快速说明各种不同扬声器的电话模型自适应程序，并且需要声学噪声条件。在本文中，我们提出了一种无监督说话者自适应方法，该方法基于HMM的足够统计信息，通过执行频谱减法，然后将已知噪声添加到输入中，扩展了无监督说话者和环境自适应方法。现有方法假设训练模型以匹配将成为识别对象的每种不同类型的背景噪声，并且不考虑给定输入的信噪比变化或背景噪声变化。相比之下，我们的方法使用噪声频谱的估计来约束输入数据的噪声，然后将已知的稳定噪声添加到保留在输入中的漂白噪声中，从而消除背景噪声之间的差异，并使我们能够使用一套声学模型。此外，关于说话人适应性，我们基于来自测试说话人的任意任意说话，从数据库中选择最接近的说话人集，并使用这些说话人的足够统计量重新训练声学模型。通过将这两种方法结合起来，我们可以快速而准确地适应新的扬声器。在信噪比为20 dB且在各种噪声条件下进行的识别实验中，与针对每个噪声环境的最小噪声环境匹配的独立于说话者的模型相比，该方法的识别率要高出2％，整体平均识别率达到85.1％。此外，我们将我们的方法与标准监督适应技术进行了比较：最大似然线性回归（MLLR）。

著录项

来源
《Electronics and Communications in Japan. Part 2, Electronics》 |2005年第8期|p.30-41|共12页
作者
Shingo Yamade; Akira Baba; Shinichi Yoshikawa; Akinobu Lee; Hiroshi Saruwatari; Kiyohiro Shikano;
展开▼
作者单位

Nara Institute of Science and Technology, Ikoma, 630-0192 Japan;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类一般性问题;
关键词
noise robust speech recognition; speaker adaptation; spectral subtraction; sufficient statis-tics; unsupervised adaptation;

机译：噪声鲁棒的语音识别;说话人自适应;谱减法;统计量足够;无监督自适应;

相似文献

外文文献
中文文献
专利

1. Noise robust speech recognition applied to unsupervised speaker adaptation [J] . Shingo Yamade, Akinobu Lee, Hiroshi Saruwatari, 電子情報通信学会技術研究報告. 言語理解とコミュニケーション. Natural Language Understanding and Models of Communication . 2002,第527期

机译：适用于无监督说话者适应的抗噪语音识别
2. Noise robust speech recognition applied to unsupervised speaker adaptation [J] . Shingo Yamade, Akinobu Lee, Hiroshi Saruwatari, 電子情報通信学会技術研究報告. 音声. Speech . 2002,第529期

机译：适用于无监督说话者适应的抗噪语音识别
3. Noise robust speech recognition applied to unsupervised speaker adaptation [J] . Shingo Yamade, Akinobu Lee, Hiroshi Saruwatari, 電子情報通信学会技術研究報告. 音声. Speech . 2002,第529期

机译：噪声强大的语音识别适用于无监督的扬声器适应
4. On combining DNN and GMM with unsupervised speaker adaptation for robust automatic speech recognition [C] . Liu Shilin, Sim Khe Chai IEEE International Conference on Acoustics, Speech and Signal Processing . 2014

机译：将DNN和GMM与无监督的说话人自适应结合以实现强大的自动语音识别
5. An ensemble speaker and speaking environment modeling approach to robust speech recognition. [D] . Tsao, Yu. 2008

机译：集成的演讲者和说话环境建模方法可实现强大的语音识别。
6. Recognizing the message and the messenger: biomimetic spectral analysis for robust speech and speaker recognition [O] . Sridhar Krishna Nemala, Kailash Patil, Mounya Elhilali -1

机译：识别消息和使者：仿生频谱分析可增强语音和说话者识别能力
7. On Robustness of Unsupervised Domain Adaptation for Speaker Recognition [O] . Pierre-Michel Bousquet, Mickael Rouvier 2019

机译：关于扬声器识别的无监督域适应的鲁棒性
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Unsupervised Speaker Adaptation for Robust Speech Recognition in Real Environments

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅