首页> 外文会议>Workshop on Automatic Speech Recognition and Understanding >FIXED-DIMENSIONAL ACOUSTIC EMBEDDINGS OF VARIABLE-LENGTH SEGMENTS IN LOW-RESOURCE SETTINGS
【24h】

FIXED-DIMENSIONAL ACOUSTIC EMBEDDINGS OF VARIABLE-LENGTH SEGMENTS IN LOW-RESOURCE SETTINGS

机译:低资源设置中可变长度段的固定尺寸声学嵌入

获取原文

摘要

Measures of acoustic similarity between words or other units are critical for segmental exemplar-based acoustic models, spoken term discovery, and query-by-example search. Dynamic time warping (DTW) alignment cost has been the most commonly used measure, but it has well-known inadequacies. Some recently proposed alternatives require large amounts of training data. In the interest of finding more efficient, accurate, and low-resource alternatives, we consider the problem of embedding speech segments of arbitrary length into fixed-dimensional spaces in which simple distances (such as cosine or Euclidean) serve as a proxy for linguistically meaningful (phonetic, lexical, etc.) dissimilarities. Such embeddings would enable efficient audio indexing and permit application of standard distance learning techniques to segmental acoustic modeling. In this paper, we explore several supervised and unsupervised approaches to this problem and evaluate them on an acoustic word discrimination task. We identify several embedding algorithms that match or improve upon the DTW baseline in low-resource settings.
机译:单词或其他单位之间的声学​​相似度对于基于分段示例的声学模型,说话术语发现和查询逐个搜索至关重要。动态时间翘曲(DTW)对齐成本是最常用的措施,但它具有众所周知的不足。一些最近提出的替代方案需要大量的培训数据。为了找到更高效,准确和低资源的替代方案,我们考虑将任意长度的语音段嵌入到固定尺寸空间中,其中简单的距离(如余弦或欧几里德)用作语言上有意义的代理(语音,词汇等)异化。这种嵌入式将实现高效的音频索引并允许将标准距离学习技术应用于分段声学建模。在本文中,我们探讨了这个问题的几种监督和无人监督的方法,并在声学词歧视任务上评估它们。我们识别几个嵌入算法,匹配或改进DTW基线在低资源设置中。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号