【24h】

Recognizing GSM digital speech

机译:识别GSM数字语音

获取原文
获取原文并翻译 | 示例
       

摘要

The Global System for Mobile (GSM) environment encompasses three main problems for automatic speech recognition (ASR) systems: noisy scenarios, source coding distortion, and transmission errors. The first one has already received much attention; however, source coding distortion and transmission errors must be explicitly addressed. In this paper, we propose an alternative front-end for speech recognition over GSM networks. This front-end is specially conceived to be effective against source coding distortion and transmission errors. Specifically, we suggest extracting the recognition feature vectors directly from the encoded speech (i.e., the bitstream) instead of decoding it and subsequently extracting the feature vectors. This approach offers two significant advantages. First, the recognition system is only affected by the quantization distortion of the spectral envelope. Thus, we are avoiding the influence of other sources of distortion as a result of the encoding-decoding process. Second, when transmission errors occur, our front-end becomes more effective since it is not affected by errors in bits allocated to the excitation signal. We have considered the half and the full-rate standard codecs and compared the proposed front-end with the conventional approach in two ASR tasks, namely, speaker-independent isolated digit recognition and speaker-independent continuous speech recognition. In general, our approach outperforms the conventional procedure, for a variety of simulated channel conditions. Furthermore, the disparity increases as the network conditions worsen.
机译:全球移动系统(GSM)环境包含自动语音识别(ASR)系统的三个主要问题:嘈杂的情况,源代码失真和传输错误。第一个已经引起了很多关注。但是,必须明确解决源代码编码失真和传输错误。在本文中,我们提出了另一种用于GSM网络上语音识别的前端。特别设计了此前端,可有效防止源代码失真和传输错误。具体而言,我们建议直接从编码语音(即比特流)中提取识别特征向量,而不是对其进行解码,然后再提取特征向量。这种方法具有两个明显的优点。首先,识别系统仅受频谱包络的​​量化失真的影响。因此,我们避免了由于编码解码过程而导致的其他失真源的影响。其次,当发生传输错误时,由于不受分配给激励信号的位错误的影响,我们的前端变得更加有效。我们已经考虑了半速率和全速率标准编解码器,并将拟议的前端与常规方法在两个ASR任务中进行了比较,即独立于说话者的隔离数字识别和独立于说话者的连续语音识别。通常,对于各种模拟信道条件,我们的方法要优于常规方法。此外,随着网络状况的恶化,视差增加。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号