Speech recognition with auxiliary information

Stephenson T.A.; Doss M.M.; Bourlard H.

首页> 外文期刊>IEEE Transactions on Speech and Audio Proceessing >Speech recognition with auxiliary information

【24h】

Speech recognition with auxiliary information

机译：具有辅助信息的语音识别

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

State-of-the-art automatic speech recognition (ASR) systems are usually based on hidden Markov models (HMMs) that emit cepstral-based features which are assumed to be piecewise stationary. While not really robust to noise, these features are also known to be very sensitive to "auxiliary" information, such as pitch, energy, rate-of-speech (ROS), etc. Attempts so far to include such auxiliary information in state-of-the-art ASR systems have often been based on simply appending these auxiliary features to the standard acoustic feature vectors. In the present paper, we investigate different approaches to incorporating this auxiliary information using dynamic Bayesian networks (DBNs) or hybrid HMM/ANNs (HMMs with artificial neural networks). These approaches are motivated by the fact that the auxiliary information is not necessarily (directly) emitted by the HMM states but, rather, carries higher-level information (e.g., speaker characteristics) that is correlated with the standard features. As implicitly done for gender modeling elsewhere, this auxiliary information then appears as a conditional variable in the emission distributions and can be hidden (except in the case of some HMM/ANNs) as its estimates become too noisy. Based on recognition experiments carried out on the OGI Numbers database (free format numbers spoken over the telephone), we show that auxiliary information that conditions the distribution of the standard features can, in certain conditions, provide more robust recognition than using auxiliary information that is appended to the standard features; this is most evident in the case of energy as an auxiliary variable in noisy speech.

机译：最先进的自动语音识别（ASR）系统通常基于隐马尔可夫模型（HMM），这些模型会发出基于倒频谱的特征，这些特征假定是分段固定的。尽管这些功能实际上对噪声不是很稳定，但众所周知，这些功能对“辅助”信息非常敏感，例如音调，能量，语音速率（ROS）等。到目前为止，尝试将这些辅助信息包括在状态信息中，先进的ASR系统通常基于简单地将这些辅助特征附加到标准声学特征向量上。在本文中，我们研究了使用动态贝叶斯网络（DBN）或混合HMM / ANN（带有人工神经网络的HMM）来合并此辅助信息的不同方法。这些方法的动机是，辅助信息不一定（由HMM状态）直接发出，而是承载与标准功能相关的高级信息（例如，说话者特征）。正如在别处对性别建模所做的隐式处理一样，此辅助信息然后在排放分布中显示为条件变量，并且由于其估计变得过于嘈杂而可以隐藏（某些HMM / ANN除外）。基于在OGI Numbers数据库上进行的识别实验（通过电话说出的自由格式数字），我们表明，在某些情况下，比起使用辅助信息，确定标准特征分布的辅助信息可以提供更可靠的识别。附加到标准功能上；在能量作为嘈杂语音中的辅助变量的情况下，这一点最为明显。

著录项

来源
《IEEE Transactions on Speech and Audio Proceessing》 |2004年第3期|p.189-203|共15页
作者
Stephenson T.A.; Doss M.M.; Bourlard H.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类电声技术和语音信号处理;
关键词
Gaussian processes; belief networks; cepstral analysis; hidden Markov models; neural nets; speech processing; speech recognition; Gaussian mixture models; OGI numbers database; artificial neural networks; automatic speech recognition system; auxiliary information; c;

机译：高斯过程;信念网络;倒频谱分析;隐马尔可夫模型;神经网络;语音处理;语音识别;高斯混合模型;OGI编号数据库;人工神经网络;自动语音识别系统;辅助信息;c;
入库时间 2022-08-18 00:13:04

相似文献

外文文献
中文文献
专利

1. Auxiliary Features from Laser-Doppler Vibrometer Sensor for Deep Neural Network Based Robust Speech Recognition [J] . Lei Sun, Jun Du, Zhipeng Xie, Journal of VLSI signal processing systems for signal, image, and video technology . 2018,第7期

机译：激光多普勒振动计传感器的辅助功能，用于基于深度神经网络的鲁棒语音识别
2. Switching Auxiliary Chains for Speech Recognition [J] . Lin H., Ou Z. IEEE signal processing letters . 2007,第8期

机译：切换辅助链进行语音识别
3. English Phrase Speech Recognition Based on Continuous Speech Recognition Algorithm and Word Tree Constraints [J] . Haifan Du, Haiwen Duan Complexity . 2021,第a期

机译：英语短语语音识别基于连续语音识别算法和字树约束
4. Noise and Speech Estimation as Auxiliary Tasks for Robust Speech Recognition [C] . Gueorgui Pironkov, Stephane Dupont, Sean U.N. Wood, International conference on statistical language and speech processing . 2017

机译：噪声和语音估计是语音识别的辅助任务
5. Speech recognition: The interpretation of training and using speech recognition software from the perspectives of postsecondary students with learning challenges. [D] . Soenksen, Delann. 2006

机译：语音识别：从具有学习挑战的大专学生的角度解释培训和使用语音识别软件的解释。
6. Recognition of time-compressed speech does not predict recognition of natural fast-rate speech by older listeners [O] . Sandra Gordon-Salant, Danielle J. Zion, Carol Espy-Wilson -1

机译：时间压缩语音的识别无法预测年长听众对自然快速语音的识别
7. Multitask Learning with Low-Level Auxiliary Tasks for Encoder-Decoder Based Speech Recognition [O] . Toshniwal, Shubham, Tang, Hao, Lu, Liang, 2017

机译：用于编码器 - 解码器的低级辅助任务的多任务学习基于语音识别
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

Speech recognition with auxiliary information

摘要

著录项

相似文献

相关主题

期刊订阅