...
首页> 外文期刊>EURASIP journal on advances in signal processing >Subband-Based Group Delay Segmentation of Spontaneous Speech into Syllable-Like Units
【24h】

Subband-Based Group Delay Segmentation of Spontaneous Speech into Syllable-Like Units

机译:基于子带的自发语音群延迟分割成音节样单元

获取原文
   

获取外文期刊封面封底 >>

       

摘要

In the development of a syllable-centric automatic speech recognition (ASR) system, segmentation of the acoustic signal into syllabic units is an important stage. Although the short-term energy (STE) function contains useful information about syllable segment boundaries, it has to be processed before segment boundaries can be extracted. This paper presents a subband-based group delay approach to segment spontaneous speech into syllable-like units. This technique exploits the additive property of the Fourier transform phase and the deconvolution property of the cepstrum to smooth the STE function of the speech signal and make it suitable for syllable boundary detection. By treating the STE function as a magnitude spectrum of an arbitrary signal, a minimum-phase group delay function is derived. This group delay function is found to be a better representative of the STE function for syllable boundary detection. Although the group delay function derived from the STE function of the speech signal contains segment boundaries, the boundaries are difficult to determine in the context of long silences, semivowels, and fricatives. In this paper, these issues are specifically addressed and algorithms are developed to improve the segmentation performance. The speech signal is first passed through a bank of three filters, corresponding to three different spectral bands. The STE functions of these signals are computed. Using these three STE functions, three minimum-phase group delay functions are derived. By combining the evidence derived from these group delay functions, the syllable boundaries are detected. Further, a multiresolution-based technique is presented to overcome the problem of shift in segment boundaries during smoothing. Experiments carried out on the Switchboard and OGI-MLTS corpora show that the error in segmentation is at most 25 milliseconds for 67% and 76.6% of the syllable segments, respectively.
机译:在以音节为中心的自动语音识别(ASR)系统的开发中,将声音信号分割成音节单位是重要的阶段。尽管短期能量(STE)函数包含有关音节段边界的有用信息,但是必须在提取段边界之前对其进行处理。本文提出了一种基于子带的群时延方法,将自发语音分割成音节状单元。该技术利用傅立叶变换相的加性和倒谱的去卷积特性来平滑语音信号的STE函数,使其适合音节边界检测。通过将STE函数视为任意信号的幅度谱,可以得出最小相位群延迟函数。发现该群延迟函数可以更好地代表音节边界检测的STE函数。尽管从语音信号的STE函数派生的群时延函数包含片段边界,但是在长时间静音,半元音和摩擦音的情况下很难确定边界。在本文中,专门解决了这些问题,并开发了算法来提高分割性能。语音信号首先通过一组三个滤波器,分别对应于三个不同的频谱带。计算这些信号的STE函数。使用这三个STE函数,可以得出三个最小相位群延迟函数。通过组合从这些群延迟函数得出的证据,可以检测出音节边界。此外,提出了一种基于多分辨率的技术来克服平滑过程中段边界偏移的问题。在Switchboard和OGI-MLTS语料库上进行的实验表明,对于67%和76.6%的音节片段,分段错误最多为25毫秒。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号