首页> 外文期刊>Audio, Speech, and Language Processing, IEEE Transactions on >Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination
【24h】

Unsupervised Motif Acquisition in Speech via Seeded Discovery and Template Matching Combination

机译:通过种子发现和模板匹配组合进行语音无监督主题获取

获取原文
获取原文并翻译 | 示例

摘要

This paper describes and evaluates a computational architecture to discover and collect occurrences of speech repetitions, or motifs, in a totally unsupervised fashion, that is in the absence of acoustic, lexical or pronunciation modeling and training material. In the last few years, this task has known an increasing interest from the speech community because of a) its potential applicability in spoken document processing (as a preliminary step to summarization, topic clustering, etc.) and b) its novel methodology, that defines a new paradigm to speech processing that circumvents the issues common to all supervised, trained technologies. The contributions implied by the proposed system are two-fold: 1) the design of a discovery strategy that detects repetitions by extending matches of motif fragments, called seeds; 2) the implementation of template matching techniques to detect acoustically close segments, based on dynamic time warping (DTW) and self-similarity matrix (SSM) comparison of speech templates, in contrast to the decoding procedures of model-based recognition systems. The architecture is thoroughly evaluated on several hours of French broadcast news shows according to various parameter settings and acoustic features, namely mel-frequency cepstral coefficients (MFCCs) and different types of posteriorgrams: Gaussian mixture model (GMM)-based, and phone-based posteriors, in both language-matched and mismatched conditions. The evaluation highlights a) the improved robustness of the system that jointly employs DTW and SSM and b) the relevant impact of language-specific features to acoustic similarity detection based on template matching.
机译:本文描述并评估了一种计算体系结构,该发现以完全无监督的方式发现和收集语音重复或主题的出现,而这没有声学,词汇或发音建模和培训材料。在过去的几年中,语音任务引起了越来越多的关注,这是因为a)它在语音文档处理中的潜在适用性(作为概述,主题聚类等的初步步骤),以及b)其新颖的方法,定义了语音处理的新范例,该范例规避了所有受过监督和训练有素的技术所共有的问题。所提出的系统所隐含的贡献有两个方面:1)设计发现策略,该策略通过扩展称为种子的基序片段的匹配来检测重复。 2)与基于模型的识别系统的解码过程相比,基于语音模板的动态时间规整(DTW)和自相似矩阵(SSM)比较,实现了模板匹配技术来检测声学上接近的片段。根据各种参数设置和声学特征(即梅尔频率倒谱系数(MFCC)和不同类型的后验图),在几个小时的法国广播新闻节目中对架构进行了全面评估:基于高斯混合模型(GMM)和基于电话在语言匹配和不匹配条件下的后代。该评估突出了a)结合使用DTW和SSM的系统的改进的鲁棒性,以及b)语言特定功能对基于模板匹配的声学相似性检测的相关影响。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号