首页> 外文会议>Automatic Speech Recognition amp; Understanding, 2009. ASRU 2009 >Kernel metric learning for phonetic classification
【24h】

Kernel metric learning for phonetic classification

机译:核度量学习用于语音分类

获取原文

摘要

While a sound spoken is described by a handful of frame-level spectral vectors, not all frames have equal contribution for either human perception or machine classification. In this paper, we introduce a novel framework to automatically emphasize important speech frames relevant to phonetic information. We jointly learn the importance of speech frames by a distance metric across the phone classes, attempting to satisfy a large margin constraint: the distance from a segment to its correct label class should be less than the distance to any other phone class by the largest possible margin. Furthermore, an universal background model structure is proposed to give the correspondence between statistical models of phone types and tokens, allowing us to use statistical models of each phone token in a large margin speech recognition framework. Experiments on TIMIT database demonstrated the effectiveness of our framework.
机译:尽管通过少数帧级频谱矢量描述了语音,但并非所有帧对于人类感知或机器分类都具有同等的贡献。在本文中,我们介绍了一种新颖的框架来自动强调与语音信息相关的重要语音框架。我们试图通过跨电话类别的距离度量来共同学习语音帧的重要性,试图满足较大的裕度约束:从片段到其正确标签类别的距离应小于与任何其他电话类别的距离,最大余量。此外,提出了一种通用的背景模型结构来给出电话类型和令牌的统计模型之间的对应关系,从而使我们能够在大幅度语音识别框架中使用每个电话令牌的统计模型。 TIMIT数据库上的实验证明了我们框架的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号