首页> 外文期刊>IEEE transactions on audio, speech and language processing >Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora
【24h】

Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora

机译:广播新闻和NGSW语料库的无监督音频分类和分段的进展

获取原文
获取原文并翻译 | 示例

摘要

The problem of unsupervised audio classification and segmentation continues to be a challenging research problem which significantly impacts automatic speech recognition (ASR) and spoken document retrieval (SDR) performance. This paper addresses novel advances in 1) audio classification for speech recognition and 2) audio segmentation for unsupervised multispeaker change detection. A new algorithm is proposed for audio classification, which is based on weighted GMM Networks (WGN). Two new extended-time features: variance of the spectrum flux (VSF) and variance of the zero-crossing rate (VZCR) are used to preclassify the audio and supply weights to the output probabilities of the GMM networks. The classification is then implemented using weighted GMM networks. Since historically there have been no features specifically designed for audio segmentation, we evaluate 16 potential features including three new proposed features: perceptual minimum variance distortionless response (PMVDR), smoothed zero-crossing rate (SZCR), and filterbank log energy coefficients (FBLC) in 14 noisy environments to determine the best robust features on the average across these conditions. Next, a new distance metric, T2-mean, is proposed which is intended to improve segmentation for short segment turns (i.e., 1-5 s). A new false alarm compensation procedure is implemented, which can compensate the false alarm rate significantly with little cost to the miss rate. Evaluations on a standard data set-Defense Advanced Research Projects Agency (DARPA) Hub4 Broadcast News 1997 evaluation data-show that the WGN classification algorithm achieves over a 50% improvement versus the GMM network baseline algorithm, and the proposed compound segmentation algorithm achieves 23%-10% improvement in all metrics versus the baseline Mel-frequency cepstral coefficients (MFCC) and traditional Bayesian information criterion (BIC) algorithm. The new classification and segmentation algorithms also obtain very satisfactory results on the more diverse and challenging National Gallery of the Spoken Word (NGSW) corpus.
机译:无监督音频分类和分段的问题仍然是一个具有挑战性的研究问题,它极大地影响了自动语音识别(ASR)和语音文档检索(SDR)的性能。本文介绍了以下新进展:1)语音识别的音频分类和2)无监督的多扬声器更改检测的音频分割。提出了一种基于加权GMM网络(WGN)的音频分类新算法。两个新的扩展时间功能:频谱通量(VSF)的变化和零交叉速率(VZCR)的变化用于对GMM网络的输出概率进行音频和权重预分类。然后使用加权GMM网络实施分类。由于历史上一直没有专门为音频分段设计的功能,因此我们评估了16个潜在功能,包括三个新提出的功能:感知最小方差无失真响应(PMVDR),平滑过零率(SZCR)和滤波器组对数能量系数(FBLC)在14个嘈杂的环境中确定这些条件下平均值的最佳鲁棒性。接下来,提出了一种新的距离度量T2-mean,旨在改善短段转弯(即1-5 s)时的分段效果。实施了一种新的虚假补偿程序,该程序可以以极少的失误成本来显着地补偿虚假率。对标准数据集的评估-国防高级研究计划局(DARPA)Hub4 Broadcast News 1997评估数据显示-与GMM网络基线算法相比,WGN分类算法实现了50%的改进,而拟议的复合细分算法实现了23%与基准梅尔频率倒谱系数(MFCC)和传统的贝叶斯信息准则(BIC)算法相比,所有指标均提高了-10%。新的分类和分割算法在更加多样化和更具挑战性的国家话语库(NGSW)语料库上也获得了非常令人满意的结果。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号