首页> 外文期刊>Communications in Nonlinear Science and Numerical Simulation >Precise detection of speech endpoints dynamically: A wavelet convolution based approach
【24h】

Precise detection of speech endpoints dynamically: A wavelet convolution based approach

机译:动态精确地检测语音端点:基于小波卷积的方法

获取原文
获取原文并翻译 | 示例
       

摘要

Precise detection of speech endpoints is an important factor which affects the performance of the systems where speech utterances need to be extracted from the speech signal such as Automatic Speech Recognition (ASR) system. Existing endpoint detection (EPD) methods mostly uses Short-Term Energy (STE), Zero-Crossing Rate (ZCR) based approaches and their variants. But STE and ZCR based EPD algorithms often fail in the presence of Non-speech Sound Artifacts (NSAs) produced by the speakers. Pattern recognition and classification techniques are also applied but those methods require labeled data for training. In this article, a novel approach is proposed to extract speech endpoints and the algorithm is termed as Wavelet Convolution based Speech Endpoint Detection (WCSED). WCSED decomposes the speech signal into high-frequency and low-frequency components using wavelet convolution and then computes information-entropy based thresholds for the two frequency components. The low-frequency thresholds are used to extract voiced speech segments, whereas the high-frequency thresholds are used to extract the unvoiced speech segments by filtering out the NSAs. WCSED does not require any labeled data for training and can automatically extract speech segments. Experiments are carried out on two speech databases and the results are promising even in the presence of NSAs. (C) 2018 Elsevier B.V. All rights reserved.
机译:语音端点的精确检测是影响需要从语音信号中提取语音的系统的性能的重要因素,例如自动语音识别(ASR)系统。现有的端点检测(EPD)方法主要使用基于短期能量(STE),零交叉速率(ZCR)的方法及其变体。但是,基于STE和ZCR的EPD算法通常会在扬声器产生非语音声音伪像(NSA)的情况下失败。模式识别和分类技术也被应用,但是那些方法需要标签数据进行训练。在本文中,提出了一种新颖的方法来提取语音端点,该算法称为基于小波卷积的语音端点检测(WCSED)。 WCSED使用小波卷积将语音信号分解为高频和低频分量,然后针对这两个频率分量计算基于信息熵的阈值。低频阈值用于提取带语音的语音段,而高频阈值用于通过滤除NSA来提取清音语音段。 WCSED不需要任何标签数据即可进行训练,并且可以自动提取语音片段。在两个语音数据库上进行了实验,即使存在NSA,结果也很有希望。 (C)2018 Elsevier B.V.保留所有权利。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号