首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Encoding Navigable Speech Sources: A Psychoacoustic-Based Analysis-by-Synthesis Approach
【24h】

Encoding Navigable Speech Sources: A Psychoacoustic-Based Analysis-by-Synthesis Approach

机译:编码可导航语音源:基于心理声学的综合分析方法

获取原文
获取原文并翻译 | 示例

摘要

This paper presents a psychoacoustic-based analysis-by-synthesis approach for compressing navigable speech sources. The approach targets multi-party teleconferencing applications, where selective reproduction of individual speech sources is desired. Based on exploiting sparsity of speech in the perceptual time-frequency domain, multiple speech signals are encoded into one mono mixture signal, which can be further compressed using a standard speech codec. Using side information indicating the active speech source for each time frequency instant enables flexible decoding and reproduction. Objective results highlight the importance of considering perception when exploiting the sparse nature of speech in the time-frequency domain. Results show that this sparsity, as measured by the preserved energy level of perceptually important time-frequency components extracted from mixtures of speech signals, is similar in both anechoic and reverberant environments. The proposed approach is applied to a series of simulated and real reverberant speech recordings, where the resulting speech mixtures are compressed using a standard speech codec operating at 32 kbps. The perceptual quality, as judged both by objective and subjective evaluations, outperforms a simple sparsity approach that does not consider perception as well as the approach that encodes each source separately. While the perceptual quality of individual speech sources is maintained, subjective tests also confirm the approach maintains the perceptual quality of the spatialized speech scene.
机译:本文提出了一种基于心理声学的综合分析方法,用于压缩可导航语音源。该方法针对多方电话会议应用,其中需要选择性地复制单个语音源。基于在感知时频域中利用语音稀疏性,多个语音信号被编码为一个单声道混合信号,可以使用标准语音编解码器对其进行进一步压缩。使用指示每个时间频率瞬间的活动语音源的辅助信息,可以灵活地进行解码和再现。客观结果突显了在时频域中利用语音的稀疏性时考虑感知的重要性。结果表明,这种稀疏度是通过从语音信号混合物中提取的感知重​​要的时频分量的保留能量水平来衡量的,在回声和混响环境中都是相似的。所提出的方法被应用于一系列模拟和真实的混响语音记录,其中使用以32 kbps操作的标准语音编解码器压缩所得的语音混合。通过客观评估和主观评估来判断的感知质量优于不考虑感知的简单稀疏方法以及单独编码每个来源的方法。在保持单个语音源的感知质量的同时,主观测试也证实了该方法可以保持空间化语音场景的感知质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号