首页> 外文期刊>IEEE transactions on audio, speech and language processing >An Effective Algorithm for Automatic Detection and Exact Demarcation of Breath Sounds in Speech and Song Signals
【24h】

An Effective Algorithm for Automatic Detection and Exact Demarcation of Breath Sounds in Speech and Song Signals

机译:自动检测和精确划分语音和歌曲信号中的呼吸音的有效算法

获取原文
获取原文并翻译 | 示例

摘要

Automatic detection of predefined events in speech and audio signals is a challenging and promising subject in signal processing. One important application of such detection is removal or suppression of unwanted sounds in audio recordings, for instance in the professional music industry, where the demand for quality is very high. Breath sounds, which are present in most song recordings and often degrade the aesthetic quality of the voice, are an example of such unwanted sounds. Another example is bad pronunciation of certain phonemes. In this paper, we present an automatic algorithm for accurate detection of breaths in speech or song signals. The algorithm is based on a template matching approach, and consists of three phases. In the first phase, a template is constructed from mel frequency cepstral coefficients (MFCCs) matrices of several breath examples and their singular value decompositions, to capture the characteristics of a typical breath event. Next, in the initial processing phase, each short-time frame is compared to the breath template, and marked as breathy or nonbreathy according to predefined thresholds. Finally, an edge detection algorithm, based on various time-domain and frequency-domain parameters, is applied to demarcate the exact boundaries of each breath event and to eliminate possible false detections. Evaluation of the algorithm on a database of speech and songs containing several hundred breath sounds yielded a correct identification rate of 98% with a specificity of 96%
机译:在语音和音频信号中自动检测预定义的事件是信号处理中具有挑战性和前途的主题。这种检测的一个重要应用是消除或抑制音频记录中不需要的声音,例如在对音乐质量有很高要求的专业音乐行业中。大多数歌曲录音中都存在的呼吸音通常会降低声音的美学质量,这就是此类不良声音的一个示例。另一个例子是某些音素的不良发音。在本文中,我们提出了一种自动算法,用于准确检测语音或歌曲信号中的呼吸。该算法基于模板匹配方法,包括三个阶段。在第一阶段,根据几个呼吸示例的梅尔频率倒谱系数(MFCC)矩阵及其奇异值分解构造一个模板,以捕获典型呼吸事件的特征。接下来,在初始处理阶段,将每个短时帧与呼吸模板进行比较,并根据预定义的阈值将其标记为呼吸或非呼吸。最后,基于各种时域和频域参数的边缘检测算法被应用于划分每个呼吸事件的确切边界,并消除可能的错误检测。在包含数百种呼吸声音的语音和歌曲数据库上对该算法进行评估,得出正确识别率为98%,特异性为96%

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号