首页> 外文期刊>Electronics and Communications in Japan. Part 2, Electronics >Unsupervised Speaker Adaptation for Robust Speech Recognition in Real Environments
【24h】

Unsupervised Speaker Adaptation for Robust Speech Recognition in Real Environments

机译:无监督说话人适应,可在真实环境中实现可靠的语音识别

获取原文
获取原文并翻译 | 示例
获取外文期刊封面目录资料

摘要

In order to achieve high-precision speech recognition in real environments phone model adaptation procedures that can rapidly account for a wide range of different speakers and acoustic noise conditions are required. In this paper we propose an unsupervised speaker adaptation method that extends an unsupervised speaker and environment adaptation method based on sufficient statistics from HMMs by performing spectral subtraction and then adding a known noise to the input. Existing methods assume that a model is trained to match each of the different types of background noise that will be the object of recognition and do not consider variations in the signal-to-noise ratio or changes in the background noise for given inputs. In contrast, our method constrains the noise of the input data using an estimation of the noise spectra and then adds a known stable noise to the bleached noise that remains in the input, thereby smoothing out differences between background noises and enabling us to perform recognition with a single set of acoustic models. In addition, with regard to speaker adaptation, we select the set of closest speakers from our database on the basis of a single arbitrary utterance from the test speaker and retrain the acoustic models using the sufficient statistics of those speakers. By combining these two methods we are able to rapidly and accurately adapt to a new speaker. In recognition experiments with a signal-to-noise ratio of 20 dB and in a variety of noise conditions, the proposed method resulted in a recognition rate of 2 percent more than a speaker-independent model matched to the lest noise environment for each noise environment, achieving an average recognition performance of 85.1 percent overall. In addition, we conducted a comparison of our method with astandard supervised adaptation technique: maximum likelihood linear regression (MLLR).
机译:为了在真实环境中实现高精度的语音识别,需要能够快速说明各种不同扬声器的电话模型自适应程序,并且需要声学噪声条件。在本文中,我们提出了一种无监督说话者自适应方法,该方法基于HMM的足够统计信息,通过执行频谱减法,然后将已知噪声添加到输入中,扩展了无监督说话者和环境自适应方法。现有方法假设训练模型以匹配将成为识别对象的每种不同类型的背景噪声,并且不考虑给定输入的信噪比变化或背景噪声变化。相比之下,我们的方法使用噪声频谱的估计来约束输入数据的噪声,然后将已知的稳定噪声添加到保留在输入中的漂白噪声中,从而消除背景噪声之间的差异,并使我们能够使用一套声学模型。此外,关于说话人适应性,我们基于来自测试说话人的任意任意说话,从数据库中选择最接近的说话人集,并使用这些说话人的足够统计量重新训练声学模型。通过将这两种方法结合起来,我们可以快速而准确地适应新的扬声器。在信噪比为20 dB且在各种噪声条件下进行的识别实验中,与针对每个噪声环境的最小噪声环境匹配的独立于说话者的模型相比,该方法的识别率要高出2% ,整体平均识别率达到85.1%。此外,我们将我们的方法与标准监督适应技术进行了比较:最大似然线性回归(MLLR)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号