首页> 外文OA文献 >Blind Speech Segmentation using Spectrogram Image-based Features and Mel Cepstral Coefficients
【2h】

Blind Speech Segmentation using Spectrogram Image-based Features and Mel Cepstral Coefficients

机译:使用基于谱图图像的特征和梅尔倒谱系数的盲语音分割

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

This paper introduces a novel method for blind speech segmentation at a phone level based on image processing. We consider the spectrogram of the waveform of an utterance as an image and hypothesize that its striping defects, i.e. discontinuities, appear due to phone boundaries. Using a simple image destriping algorithm these discontinuities are found. To discover phone transitions which are not as salient in the image, we compute spectral changes derived from the time evolution of Mel cepstral parametrisation of speech. These so called image-based and acoustic features are then combined to form a mixed probability function, whose values indicate the likelihood of a phone boundary being located at the corresponding time frame. The method is completely unsupervised and achieves an accuracy of 75.59% at a -3.26% over segmentation rate, yielding an F-measure of 0.76 and an 0.80 R-value on the TIMIT dataset.
机译:本文介绍了一种基于图像处理的电话级盲语音分割新方法。我们将发声波形的频谱图视为图像,并假设其条纹缺陷(即不连续性)是由于电话边界而出现的。使用简单的图像去条纹算法,可以找到这些不连续性。为了发现在图像中不那么显着的电话转换,我们计算了从语音的梅尔倒谱参数化的时间演变得出的频谱变化。然后将这些所谓的基于图像的特征和声学特征进行组合,以形成混合概率函数,该函数的值指示电话边界位于相应时间范围内的可能性。该方法是完全不受监督的,并且在分段率超过-3.26%的情况下达到了75.59%的精度,在TIMIT数据集上得出的F值为0.76,R值为0.80。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号