首页> 外文期刊>IEEE Transactions on Speech and Audio Proceessing >Speech recognition with auxiliary information
【24h】

Speech recognition with auxiliary information

机译:具有辅助信息的语音识别

获取原文
获取原文并翻译 | 示例
       

摘要

State-of-the-art automatic speech recognition (ASR) systems are usually based on hidden Markov models (HMMs) that emit cepstral-based features which are assumed to be piecewise stationary. While not really robust to noise, these features are also known to be very sensitive to "auxiliary" information, such as pitch, energy, rate-of-speech (ROS), etc. Attempts so far to include such auxiliary information in state-of-the-art ASR systems have often been based on simply appending these auxiliary features to the standard acoustic feature vectors. In the present paper, we investigate different approaches to incorporating this auxiliary information using dynamic Bayesian networks (DBNs) or hybrid HMM/ANNs (HMMs with artificial neural networks). These approaches are motivated by the fact that the auxiliary information is not necessarily (directly) emitted by the HMM states but, rather, carries higher-level information (e.g., speaker characteristics) that is correlated with the standard features. As implicitly done for gender modeling elsewhere, this auxiliary information then appears as a conditional variable in the emission distributions and can be hidden (except in the case of some HMM/ANNs) as its estimates become too noisy. Based on recognition experiments carried out on the OGI Numbers database (free format numbers spoken over the telephone), we show that auxiliary information that conditions the distribution of the standard features can, in certain conditions, provide more robust recognition than using auxiliary information that is appended to the standard features; this is most evident in the case of energy as an auxiliary variable in noisy speech.
机译:最先进的自动语音识别(ASR)系统通常基于隐马尔可夫模型(HMM),这些模型会发出基于倒频谱的特征,这些特征假定是分段固定的。尽管这些功能实际上对噪声不是很稳定,但众所周知,这些功能对“辅助”信息非常敏感,例如音调,能量,语音速率(ROS)等。到目前为止,尝试将这些辅助信息包括在状态信息中,先进的ASR系统通常基于简单地将这些辅助特征附加到标准声学特征向量上。在本文中,我们研究了使用动态贝叶斯网络(DBN)或混合HMM / ANN(带有人工神经网络的HMM)来合并此辅助信息的不同方法。这些方法的动机是,辅助信息不一定(由HMM状态)直接发出,而是承载与标准功能相关的高级信息(例如,说话者特征)。正如在别处对性别建模所做的隐式处理一样,此辅助信息然后在排放分布中显示为条件变量,并且由于其估计变得过于嘈杂而可以隐藏(某些HMM / ANN除外)。基于在OGI Numbers数据库上进行的识别实验(通过电话说出的自由格式数字),我们表明,在某些情况下,比起使用辅助信息,确定标准特征分布的辅助信息可以提供更可靠的识别。附加到标准功能上;在能量作为嘈杂语音中的辅助变量的情况下,这一点最为明显。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号