Foreground Speech Segmentation and Enhancement Using Glottal Closure Instants and Mel Cepstral Coefficients

K. T. Deepak; S. R. Mahadeva Prasanna

首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Foreground Speech Segmentation and Enhancement Using Glottal Closure Instants and Mel Cepstral Coefficients

【24h】

Foreground Speech Segmentation and Enhancement Using Glottal Closure Instants and Mel Cepstral Coefficients

机译：使用声门关闭瞬间和梅尔倒谱系数进行前景语音分割和增强

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

In this paper, the speech signal recorded from the desired speaker close to microphone in natural environment is regarded as and rest of the interfering sources as . The proposed paper exploits speech production features like glottal closure instants in time domain and vocal tract information in spectral domain to segment the desired speaker's speech and to further enhance it. The foreground speech is perceptually enhanced using the auditory perception feature in mel-frequency domain using mel-cepstral coefficients and its inversion using mel log spectrum approximation filter. The focus is on enhancing the production and perceptual features of foreground speech rather than relying on modeling the interfering sources. The speech data are collected in different natural environments from different speakers in order to evaluate the proposed method. The enhanced speech signals derived at three different stages of the proposed method are evaluated with state-of-the-art methods in terms of subjective and objective measures. The proposed method provides improved performance compared to the considered state-of-the-art methods. In terms of the proposed objective measure , the enhancement approach presented in this paper gives an average improvement of 12 dB as opposed to existing spectral subtraction-based method which provides 3 dB. Moreover, subjective evaluation using 24 different subjects corroborates the objective test results.

机译：在本文中，从自然环境中靠近麦克风的所需扬声器记录的语音信号被视为，其余干扰源被视为。拟议的论文利用语音产生功能，例如时域中的声门闭合瞬间和频谱域中的声道信息，来分割所需说话者的语音并进一步增强语音效果。使用mel倒谱系数，使用mel频域中的听觉感知特性在听觉上增强前景语音，并使用mel log频谱近似滤波器对前景语音进行倒置。重点在于增强前景语音的产生和感知特性，而不是依赖于对干扰源进行建模。为了评估所提出的方法，在不同的自然环境中从不同的说话者那里收集语音数据。在主观和客观测量方面，使用最新方法对在所提出方法的三个不同阶段得出的增强语音信号进行了评估。与所考虑的最新方法相比，所提出的方法提供了改进的性能。在提出的客观措施方面，与现有的基于频谱减法的方法（3 dB）相比，本文提出的增强方法平均提高了12 dB。此外，使用24个不同主题进行的主观评估证实了客观测试结果。

著录项

来源
《Audio, Speech, and Language Processing, IEEE/ACM Transactions on》 |2016年第7期|1204-1218|共15页
作者
K. T. Deepak; S. R. Mahadeva Prasanna;
展开▼
作者单位

Department of Electronics and Electrical Engineering, Indian Institute of Technology Guwahati, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Foreground segmentation; MCC; MLSA; formant peaks; glottal closure instants; glottal closure instants (GCI); speech enhancement; zero band filter; zero band filter (ZBF);

机译：前景分割;MCC;MLSA;共振峰;声门闭合瞬间;声门闭合瞬间（GCI）;语音增强;零频带滤波器;零频带滤波器（ZBF）;

相似文献

外文文献
中文文献
专利

1. Design of MELPe-Based Variable-Bit-Rate Speech Coding with Mel Scale Approach Using Low-Order Linear Prediction Filter and Representing Excitation Signal Using Glottal Closure Instants [J] . M. S. Arun Sankar, P. S. Sathidevi Arabian Journal for Science and Engineering . 2020,第3期

机译：基于梅尔尺度法的基于MELPe的可变比特率语音编码设计，采用低阶线性预测滤波器，并利用声门闭合时刻表示激励信号
2. MEL FREQUENCY CEPSTRAL COEFFICIENTS (MFCC) FEATURE EXTRACTION ENHANCEMENT IN THE APPLICATION OF SPEECH RECOGNITION: A COMPARISON STUDY [J] . SAYF A. MAJEED, HAFIZAH HUSAIN, SALINA ABDUL SAMAD, Journal of Theoretical and Applied Information Technology . 2015,第1期

机译：MEL频率倒谱系数（MFCC）特征提取在语音识别中的应用：对比研究
3. Intelligibility enhancement of HMM-generated speech in additive noise by modifying Mel cepstral coefficients to increase the glimpse proportion [J] . Cassia Valentini-Botinhao, Junichi Yamagishi, Simon King, Computer speech and language . 2014,第2期

机译：通过修改Mel倒谱系数以增加瞥见比例来增强HMM生成的语音在可加性噪声中的清晰度
4. Blind speech segmentation using spectrogram image-based features and Mel cepstral coefficients [C] . Adriana Stan, Cassia Valentini-Botinhao, Bogdan Orza, IEEE Workshop on Spoken Language Technology . 2016

机译：使用基于频谱图图像的特征和梅尔倒谱系数进行盲语音分割
5. Estimation of cepstral coefficients for robust speech recognition. [D] . Indrebo, Kevin M. 2008

机译：倒频谱系数的估计，用于鲁棒的语音识别。
6. The application of fractional Mel cepstral coefficient in deceptive speech detection [O] . Xinyu Pan, Heming Zhao, Yan Zhou -1

机译：分数梅尔倒谱系数在欺骗性语音检测中的应用
7. Blind Speech Segmentation using Spectrogram Image-based Features and Mel Cepstral Coefficients [O] . Stan, Adriana, Valentini Botinhao, Cassia, Orza, Bogdan, 2016

机译：使用基于谱图图像的特征和梅尔倒谱系数的盲语音分割

Foreground Speech Segmentation and Enhancement Using Glottal Closure Instants and Mel Cepstral Coefficients

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅