首页> 外文会议>Annual conference of the International Speech Communication Association;INTERSPEECH 2011 >Rapid Evaluation of Speech Representations for Spoken Term Discovery
【24h】

Rapid Evaluation of Speech Representations for Spoken Term Discovery

机译:用于语音术语发现的语音表示的快速评估

获取原文

摘要

Acoustic front-ends are typically developed for supervised learning tasks and are thus optimized to minimize word error rate, phone error rate, etc. However, in recent efforts to develop zero-resource speech technologies, the goal is not to use transcribed speech to train systems but instead to discover the acoustic structure of the spoken language automatically. For this new setting, we require a framework for evaluating the quality of speech representations without coupling to a particular recognition architecture. Motivated by the spoken term discovery task, we present a dynamic time warping-based framework for quantifying how well a representation can associate words of the same type spoken by different speakers. We benchmark the quality of a wide range of speech representations using multiple frame-level distance metrics and demonstrate that our performance metrics can also accurately predict phone recognition accuracies.
机译:声学前端通常是为监督学习任务而开发的,因此经过了优化,可以最大程度地降低单词错误率,电话错误率等。但是,在最近开发零资源语音技术的努力中,目标不是使用转录语音来进行训练系统,而是自动发现口语的声音结构。对于这种新设置,我们需要一个框架来评估语音表示的质量,而无需耦合到特定的识别体系结构。受口语术语发现任务的激励,我们提出了一个基于动态时间扭曲的框架,用于量化表示形式可以很好地关联不同说话者口语的相同类型的单词。我们使用多个帧级距离度量标准对各种语音表示的质量进行基准测试,并证明我们的性能指标还可以准确预测电话识别的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号