首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Low Resource Keyword Search With Synthesized Crosslingual Exemplars
【24h】

Low Resource Keyword Search With Synthesized Crosslingual Exemplars

机译:具有综合的跨语言示例的低资源关键词搜索

获取原文
获取原文并翻译 | 示例

摘要

The transfer of acoustic data across languages has been shown to improve keyword search (KWS) performance in data-scarce settings. In this paper, we propose a way of performing this transfer that reduces the impact of the prevalence of out-of-vocabulary (OOV) terms on KWS in such a setting. We investigate a novel usage of multilingual features for KWS with very little training data in the target languages. The crux of our approach is the use of synthetic phone exemplars to convert the search into a query-by-example task, which we solve with the dynamic time warping algorithm. Using bottleneck features obtained from a network trained multilingually on a set of (source) languages, we train an extended distance metric learner (EDML) for four target languages from the IARPA Babel program (which are distinct from the source languages). Compared with a baseline system that is based on automatic speech recognition (ASR) with a multilingual acoustic model, we observe an average term weighted value improvement of 0.0603 absolute (74% relative) in a setting with only 1 h of training data in the target language. When the data scarcity is relaxed to 10 h, we find that phone posteriors obtained by fine-tuning the multilingual network give better EDML systems. In this relaxed setting, the EDML systems still perform better than the baseline on OOV terms. Given their complementary natures, combining the EDML and the ASR-based baseline results in even further performance improvements in all settings.
机译:跨语言传输声音数据已显示出可以在数据稀缺的环境中改善关键字搜索(KWS)的性能。在本文中,我们提出了一种执行此转移的方法,该方法可以减少在这种情况下对KWS的无语(OOV)术语流行的影响。我们调查了针对KWS的多语言功能的新颖用法,而目标语言的培训数据却很少。我们方法的症结在于使用合成电话示例将搜索转换为按示例查询的任务,我们使用动态时间规整算法来解决该问题。使用从在一组(源)语言上进行多语言训练的网络获得的瓶颈功能,我们为IARPA Babel程序(与源语言不同)中的四种目标语言训练了扩展距离度量学习器(EDML)。与基于具有多语言声学模型的自动语音识别(ASR)的基准系统相比,在目标中只有1 h训练数据的情况下,我们观察到平均项加权值提高了0.0603绝对(相对于74%)语言。当数据短缺缓解到10小时时,我们发现通过微调多语言网络获得的电话后代将提供更好的EDML系统。在这种轻松的环境中,EDOOL系统在OOV方面的性能仍然优于基线。考虑到它们的互补性,将EDML和基于ASR的基准相结合,可以在所有环境中进一步提高性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号