首页> 外文期刊>Computer speech and language >Super-human multi-talker speech recognition: A graphical modeling approach
【24h】

Super-human multi-talker speech recognition: A graphical modeling approach

机译:超人多说话者语音识别:一种图形建模方法

获取原文
获取原文并翻译 | 示例
       

摘要

We present a system that can separate and recognize the simultaneous speech of two people recorded in a single channel. Applied to the monaural speech separation and recognition challenge, the system out-performed all other participants including human listeners - with an overall recognition error rate of 21.6%. compared to the human error rate of 22.3%. The system consists of a speaker recognizer, a model-based speech separation module, and a speech recognizer. For the separation models we explored a range of speech models that incorporate different levels of constraints on temporal dynamics to help infer the source speech signals. The system achieves its best performance when the model of temporal dynamics closely captures the grammatical constraints of the task. For inference, we compare a 2-D Viterbi algorithm and two loopy belief-propagation algorithms. We show how belief-propagation reduces the complexity of temporal inference from exponential to linear in the number of sources and the size of the language model. The best belief-propagation method results in nearly the same recognition error rate as exact inference.
机译:我们提出了一种可以分离和识别记录在单个通道中的两个人的同时语音的系统。应用于单声道语音分离和识别挑战后,该系统的表现优于其他所有参与者,包括人类听众-总体识别错误率为21.6%。相比之下,人为错误率为22.3%。该系统由说话者识别器,基于模型的语音分离模块和语音识别器组成。对于分离模型,我们探索了一系列语音模型,这些模型在时间动态方面纳入了不同级别的约束,以帮助推断源语音信号。当时间动态模型密切捕获任务的语法约束时,系统将达到最佳性能。为了进行推断,我们比较了二维维特比算法和两种循环的信念传播算法。我们展示了信念传播如何在源数量和语言模型的大小上将时间推断的复杂性从指数级降低到线性级。最佳置信度传播方法会导致与准确推断几乎相同的识别错误率。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号