首页> 外文期刊>Computer speech and language >Cepstral distance based channel selection for distant speech recognition
【24h】

Cepstral distance based channel selection for distant speech recognition

机译:基于倒谱距离的通道选择,用于远程语音识别

获取原文
获取原文并翻译 | 示例

摘要

Shifting from a single to a multi-microphone setting, distant speech recognition can be benefited from the multiple instances of the same utterance in many ways. An effective approach, especially when microphones are not organized in an array fashion, is given by channel selection (CS), which assumes that for each utterance there is at least one channel that can improve the recognition results when compared to the decoding of the remaining channels. In order to identify this most favourable channel, a possible approach is to estimate the degree of distortion that characterizes each microphone signal. In a reverberant environment, this distortion can vary significantly across microphones, for instance due to the orientation of the speaker's head. In this work, we investigate on the application of cepstral distance as a distortion measure that turns out to be closely related to properties of the room acoustics, such as reverberation time and direct-to-reverberant ratio. From this measure, a blind CS method is derived, which relies on a reference computed by averaging log magnitude spectra of all the microphone signals. Another aim of our study is to propose a novel methodology to analyze CS under a wide set of experimental conditions and setup variations, which depend on the sound source position, its orientation, and the microphone network configuration. Based on the use of prior information, we introduce an informed technique to predict CS performance. Experimental results show both the effectiveness of the proposed blind CS method and the value of the aforementioned analysis methodology. The experiments were conducted using different sets of real and simulated data, the latter ones derived from synthetic and from measured impulse responses. It is demonstrated that the proposed blind CS method is well related to the oracle selection of the best recognized channel. Moreover, our method outperforms a state-of-the-art one, especially on real data.
机译:从单麦克风设置转换为多麦克风设置,可以通过多种方式从同一发声的多个实例中受益于远距离语音识别。通道选择(CS)提供了一种有效的方法,尤其是当麦克风不是以阵列方式组织时,该方法假定与每种方法的解码相比,对于每种发音,至少有一个通道可以改善识别结果渠道。为了识别此最有利的信道,一种可能的方法是估计表征每个麦克风信号的失真程度。在混响环境中,这种失真可能会在麦克风之间发生很大变化,例如由于扬声器头部的方向。在这项工作中,我们研究了将倒谱距离作为失真度量的应用,结果证明该失真度量与室内声学特性(例如混响时间和直接混响比)紧密相关。从这种方法中,得出了一种盲CS方法,该方法依赖于通过平均所有麦克风信号的对数幅度谱而计算出的参考值。我们研究的另一个目的是提出一种新颖的方法来分析广泛的实验条件和设置变化下的CS,这取决于声源位置,其方向和麦克风网络配置。基于先前的信息,我们介绍了一种预测CS性能的技术。实验结果表明了所提出的盲目CS方法的有效性和上述分析方法的价值。实验是使用不同的真实和模拟数据集进行的,后者是从合成数据和测得的脉冲响应中得出的。结果表明,所提出的盲CS方法与最佳识别信道的预言选择密切相关。此外,我们的方法优于最新方法,特别是在真实数据方面。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号