Encoding Navigable Speech Sources: A Psychoacoustic-Based Analysis-by-Synthesis Approach

Zheng X.; Ritz C.; Xi J.

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Encoding Navigable Speech Sources: A Psychoacoustic-Based Analysis-by-Synthesis Approach

【24h】

Encoding Navigable Speech Sources: A Psychoacoustic-Based Analysis-by-Synthesis Approach

机译：编码可导航语音源：基于心理声学的综合分析方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

This paper presents a psychoacoustic-based analysis-by-synthesis approach for compressing navigable speech sources. The approach targets multi-party teleconferencing applications, where selective reproduction of individual speech sources is desired. Based on exploiting sparsity of speech in the perceptual time-frequency domain, multiple speech signals are encoded into one mono mixture signal, which can be further compressed using a standard speech codec. Using side information indicating the active speech source for each time frequency instant enables flexible decoding and reproduction. Objective results highlight the importance of considering perception when exploiting the sparse nature of speech in the time-frequency domain. Results show that this sparsity, as measured by the preserved energy level of perceptually important time-frequency components extracted from mixtures of speech signals, is similar in both anechoic and reverberant environments. The proposed approach is applied to a series of simulated and real reverberant speech recordings, where the resulting speech mixtures are compressed using a standard speech codec operating at 32 kbps. The perceptual quality, as judged both by objective and subjective evaluations, outperforms a simple sparsity approach that does not consider perception as well as the approach that encodes each source separately. While the perceptual quality of individual speech sources is maintained, subjective tests also confirm the approach maintains the perceptual quality of the spatialized speech scene.

机译：本文提出了一种基于心理声学的综合分析方法，用于压缩可导航语音源。该方法针对多方电话会议应用，其中需要选择性地复制单个语音源。基于在感知时频域中利用语音稀疏性，多个语音信号被编码为一个单声道混合信号，可以使用标准语音编解码器对其进行进一步压缩。使用指示每个时间频率瞬间的活动语音源的辅助信息，可以灵活地进行解码和再现。客观结果突显了在时频域中利用语音的稀疏性时考虑感知的重要性。结果表明，这种稀疏度是通过从语音信号混合物中提取的感知重要的时频分量的保留能量水平来衡量的，在回声和混响环境中都是相似的。所提出的方法被应用于一系列模拟和真实的混响语音记录，其中使用以32 kbps操作的标准语音编解码器压缩所得的语音混合。通过客观评估和主观评估来判断的感知质量优于不考虑感知的简单稀疏方法以及单独编码每个来源的方法。在保持单个语音源的感知质量的同时，主观测试也证实了该方法可以保持空间化语音场景的感知质量。

著录项

来源
《Audio, Speech, and Language Processing, IEEE Transactions on》 |2013年第1期|p.27-36|共10页
作者
Zheng X.; Ritz C.; Xi J.;
展开▼
作者单位

ICT Research Institute and School of Electrical Computer and Telecommunications Engineering, University of Wollongong, Australia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Multichannel speech coding; soundfield navigation; spatial teleconferencing;

机译：多通道语音编码;声场导航;空间电话会议;

相似文献

外文文献
中文文献
专利

1. A new structural approach in system identification with generalized analysis-by-synthesis for robust speech coding [J] . Joon-Hyuk Chang, Nam Soo Kim IEEE transactions on audio, speech and language processing . 2006,第3期

机译：带有综合分析的鲁棒语音编码的系统识别新结构方法
2. Encoding and communicating navigable speech soundfields [J] . Zheng Xiguang, Ritz Christian, Xi Jiangtao Multimedia Tools and Applications . 2016,第9期

机译：编码和传达可导航语音声场
3. Separation of Audio-Visual Speech Sources: A New Approach Exploiting the Audio-Visual Coherence of Speech Stimuli [J] . David Sodoyer, Jean-Luc Schwartz, Laurent Girin, EURASIP journal on advances in signal processing . 2002,第11期

机译：视听语音源分离：利用语音刺激视听连贯的新方法
4. Encoding navigable speech sources: An analysis by synthesis approach [C] . Zheng Xiguang IEEE International Conference on Acoustics, Speech and Signal Processing;ICASSP . 2012

机译：编码可导航语音源：一种综合分析方法
5. TRELLIS ENCODING FOR SOURCES AND CHANNELS (DATA COMPRESSION, QUANTIZATION, SPEECH CODING, JOINT SOURCE AND CHANNEL CODING). [D] . AYANOGLU, ENDER. 1986

机译：用于源和渠道的网格编码（数据压缩，量化，语音编码，联合源和通道编码）。
6. The socially weighted encoding of spoken words: a dual-route approach to speech perception [O] . Meghan Sumner, Seung Kyung Kim, Ed King, 2013

机译：语音的社交加权编码：语音感知的双重途径
7. Encoding and communicating navigable speech soundfields [O] . Zheng, Xiguang, Ritz, Christian H, Xi, Jiangtao 2016

机译：编码和传达可导航的语音声场
8. Source Encoding for Good Quality Speech Communications Systems [R] . O'Neal, J. B. 1970

机译：优质语音通信系统的源编码

Encoding Navigable Speech Sources: A Psychoacoustic-Based Analysis-by-Synthesis Approach

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅