Rapid Evaluation of Speech Representations for Spoken Term Discovery

机译：用于语音术语发现的语音表示的快速评估

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Acoustic front-ends are typically developed for supervised learning tasks and are thus optimized to minimize word error rate, phone error rate, etc. However, in recent efforts to develop zero-resource speech technologies, the goal is not to use transcribed speech to train systems but instead to discover the acoustic structure of the spoken language automatically. For this new setting, we require a framework for evaluating the quality of speech representations without coupling to a particular recognition architecture. Motivated by the spoken term discovery task, we present a dynamic time warping-based framework for quantifying how well a representation can associate words of the same type spoken by different speakers. We benchmark the quality of a wide range of speech representations using multiple frame-level distance metrics and demonstrate that our performance metrics can also accurately predict phone recognition accuracies.

机译：声学前端通常是为监督学习任务而开发的，因此经过了优化，可以最大程度地降低单词错误率，电话错误率等。但是，在最近开发零资源语音技术的努力中，目标不是使用转录语音来进行训练系统，而是自动发现口语的声音结构。对于这种新设置，我们需要一个框架来评估语音表示的质量，而无需耦合到特定的识别体系结构。受口语术语发现任务的激励，我们提出了一个基于动态时间扭曲的框架，用于量化表示形式可以很好地关联不同说话者口语的相同类型的单词。我们使用多个帧级距离度量标准对各种语音表示的质量进行基准测试，并证明我们的性能指标还可以准确预测电话识别的准确性。

著录项

来源
《Annual conference of the International Speech Communication Association;INTERSPEECH 2011》|2011年|p.828-831|共4页
会议地点
作者
Michael A. Carlin; Samuel Thomas; Aren Jansen; Hynek Hermansky;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类通信;
关键词
evaluation methods; acoustic front-end; spoken term discovery; zero resource;

机译：评估方法;声学前端;口语术语发现;零资源;

相似文献

外文文献
中文文献
专利

1. Resource2Vec: Linked Data distributed representations for term discovery in automatic speech recognition [J] . Alejandro Coucheiro-Limeres, Javier Ferreiros-López, Rubén San-Segundo, Expert Systems with Application . 2018,第DECa期

机译：Resource2Vec：用于自动语音识别中的术语发现的链接数据分布式表示
2. Towards spoken clinical-question answering: evaluating and adapting automatic speech-recognition systems for spoken clinical questions. [J] . Liu F, Tur G, Hakkani Tur D, Journal of the American Medical Informatics Association : . 2011,第5期

机译：迈向临床口语回答：针对临床口语评估和调整自动语音识别系统。
3. Sounds of Speech Based Spoken Document Categorization: A Subword Representation Method [J] . Weidong QU, Katsuhiko SHIRAI IEICE Transactions on Information and Systems . 2004,第5期

机译：基于语音的语音文档分类：子词表示方法
4. An iterative deep learning framework for unsupervised discovery of speech features and linguistic units with applications on spoken term detection [C] . Cheng-Tao Chung, Cheng-Yu Tsai, Hsiang-Hung Lu, IEEE Workshop on Automatic Speech Recognition and Understanding . 2015

机译：迭代深度学习框架，可无监督地发现语音特征和语言单元，并在口语术语检测中得到应用
5. Audio parsing and rapid speaker adaptation in speech recognition for spoken document retrieval. [D] . Zhou, Bowen. 2003

机译：语音识别中的音频解析和快速的说话人自适应，可用于语音文档检索。
6. Towards spoken clinical-question answering: evaluating and adapting automatic speech-recognition systems for spoken clinical questions [O] . Feifan Liu, Gokhan Tur, Dilek Hakkani-Tür, 2011

机译：走向口语临床问题的答案：针对口语临床问题评估和改编自动语音识别系统
7. Search on speech from spoken queries: the Multi-domain International ALBAYZIN 2018 Query-by-Example Spoken Term Detection Evaluation [O] . Javier Tejedor, Doroteo T. Toledano, Paula Lopez-Otero, 2019

机译：从口语查询中搜索：多域国际Albayzin 2018逐个语言检测评估

Rapid Evaluation of Speech Representations for Spoken Term Discovery

摘要

著录项

相似文献

相关主题

期刊订阅