Maximum Entropy-Based Reinforcement Learning Using a Confidence Measure in Speech Recognition for Telephone Speech

Molina C.; Yoma N. B.; Huenupan F.; Garreton C.; Wuth J.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Maximum Entropy-Based Reinforcement Learning Using a Confidence Measure in Speech Recognition for Telephone Speech

【24h】

Maximum Entropy-Based Reinforcement Learning Using a Confidence Measure in Speech Recognition for Telephone Speech

机译：电话语音识别中基于置信度的最大熵增强学习

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, a novel confidence-based reinforcement learning (RL) scheme to correct observation log-likelihoods and to address the problem of unsupervised compensation with limited estimation data is proposed. A two-step Viterbi decoding is presented which estimates a correction factor for the observation log-likelihoods that makes the recognized and neighboring HMMs more or less likely by using a confidence score. If regions in the output delivered by the recognizer exhibit low confidence scores, the second Viterbi decoding will tend to focus the search on neighboring models. In contrast, if recognized regions exhibit high confidence scores, the second Viterbi decoding will tend to retain the recognition output obtained at the first step. The proposed RL mechanism is modeled as the linear combination of two metrics or information sources: the acoustic model log-likelihood and the logarithm of a confidence metric. A criterion based on incremental conditional entropy maximization to optimize a linear combination of metrics or information sources online is also presented. The method requires only one utterance, as short as 0.7 s, and can lead to significant reductions in word error rate (WER) between 3% and 18%, depending on the task, training-testing conditions, and method used to optimize the proposed RL scheme. In contrast to ordinary feature compensation and model parameter adaptation methods, the confidence-based RL method takes place in the frame log-likelihood domain. Consequently, as shown in the results presented here, it is complementary to feature compensation and to model adaptation techniques.

机译：本文提出了一种新的基于置信度的强化学习（RL）方案，用于纠正观测对数似然率并解决估计数据有限的无监督补偿问题。提出了两步维特比解码，该维特比解码通过使用置信度分数来估计观察对数似然的校正因子，该校正因子使识别出的HMM和相邻HMM或多或少地具有可能性。如果识别器传递的输出中的区域表现出较低的置信度，则第二维特比解码将趋向于将搜索集中在相邻模型上。相反，如果识别的区域表现出高置信度分数，则第二维特比解码将倾向于保留在第一步获得的识别输出。所提出的RL机制被建模为两个度量或信息源的线性组合：声学模型的对数似然性和置信度度量的对数。还提出了一种基于增量条件熵最大化的指标，用于在线优化指标或信息源的线性组合。该方法仅需一次话语，短至0.7 s，并且可以根据任务，训练测试条件以及用于优化建议方案的方法，将字错误率（WER）显着降低3％至18％。 RL方案。与普通特征补偿和模型参数自适应方法相反，基于置信度的RL方法在帧对数似然域中进行。因此，如此处呈现的结果所示，它是特征补偿和模型自适应技术的补充。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2010年第5期|p.1041-1052|共12页
作者
Molina C.; Yoma N. B.; Huenupan F.; Garreton C.; Wuth J.;
展开▼
作者单位

Speech Processing and Transmission Laboratory, Department of Electrical Engineering, Universidad de Chile, Santiago, Chile;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Confidence measure; incremental conditional entropy; reinforcement learning; robust automatic speech recognition; telephone speech;

机译：置信度;增量条件熵;增强学习;鲁棒自动语音识别;电话语音;

相似文献

外文文献
中文文献
专利

1. Maximum confidence measure-based dual-microphone beamforming direction and beamwidth steering algorithm for robust speech recognition [J] . Liao Hsien-Cheng, Liao Yuan-Fu Journal of the Chinese Institute of Engineers . 2016,第5期

机译：基于最大置信度度量的双麦克风波束成形方向和波束宽度控制算法，用于强大的语音识别
2. Signal bias removal by maximum likelihood estimation for robust telephone speech recognition [J] . Biing-Hwang Juang, Rahim M.G. IEEE Transactions on Speech and Audio Proceeding . 1996,第1期

机译：通过最大似然估计消除信号偏差，以实现可靠的电话语音识别
3. Speaker-adapted confidence measures for speech recognition of video lectures [J] . Isaias Sanchez-Cortina, Jesus Andres-Ferrer, Alberto Sanchis, Computer speech and language . 2016,第May期

机译：演讲者适应性强的视频演讲语音识别措施
4. Unsupervised re-scoring of observation probability based on maximum entropycriterion by using confidence measure with telephone speech [C] . Carlos Molina, Nestor Becerra Yoma, Fernando Huenupan, International Speech Communication Association . 2008

机译：通过使用电话言语的信心措施，根据最大熵刺激度的无监督再次评分
5. Assessment of a measure of response confidence for a speech recognition task in noise. [D] . Dundas, John Andrew. 2009

机译：评估语音识别任务在噪声中的响应置信度。
6. Comparing Feedback Types in Multimedia Learning of Speech by Young Children With Common Speech Sound Disorders: Research Protocol for a Pretest Posttest Independent Measures Control Trial [O] . Wendy Doubé, Paul Carding, Kieran Flanagan, -1

机译：比较具有常见言语障碍的幼儿的多媒体语音学习中的反馈类型：预测试后测独立措施控制试验的研究方案
7. Reinforcement Learning Based Speech Enhancement for Robust Speech Recognition [O] . Yih-Liang Shen, Chao-Yuan Huang, Syu-Siang Wang, 2019

机译：基于加强学习的革力语音识别语音增强
8. Developing Multi-Voice Speech Recognition Confidence Measures and Applying Them to AHLTA-Mobile [R] . Gadbois, G. J. 2011

机译：开发多语音语音识别信任度量并将其应用于aHLTa-mobile

Maximum Entropy-Based Reinforcement Learning Using a Confidence Measure in Speech Recognition for Telephone Speech

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅