首页> 外文期刊>Computer speech and language >Feature learning for efficient ASR-free keyword spotting in low-resource languages
【24h】

Feature learning for efficient ASR-free keyword spotting in low-resource languages

机译:特征学习以低资源语言的高效无论是无ASR的关键字拍摄

获取原文
获取原文并翻译 | 示例
           

摘要

We consider feature learning for a computationally efficient method of keyword spotting that can be applied in severely under-resourced settings. The objective is to support humanitarian relief programmes by the United Nations (UN) in parts of Africa in which almost no language resources are available. To allow a keyword spotting system to be rapidly developed in such a language, we rely on a small and easily-compiled set of isolated keywords. Using the isolated keywords as templates, we apply dynamic time warping (DTW) to a much larger corpus of in-domain but untranscribed speech. The resulting DTW alignment scores are used to train a convolutional neural network (CNN) which is orders of magnitude more computationally efficient than DTW and therefore suitable for real-time application. We optimise this ASR-free neural network keyword spotting procedure by identifying acoustic features that provide robust performance in this almost zero-resource setting. First, we consider the benefits of incorporating information from well-resourced but unrelated languages by incorporating a multilingual bottleneck feature (BNF) extractor. Next, we consider using features extracted from an autoencoder (AE) trained on in-domain but untranscribed data. Finally, we consider features obtained from a correspondence autoencoder (CAE) which is initialised with the AE and subsequently fine-tuned on the small set of in-domain labelled data. Experiments in South African English and Luganda, a low-resource language, demonstrate that, on their own, both the BNF and CAE features can achieve a 5% relative performance improvement over baseline MFCCs. However, by using BNFs as input to the CAE, even better performance is achieved, resulting in a more than 27% relative improvement over MFCCs in ROC area-under-the-curve (AUC) and more than twice as many top-10 retrievals. We also show that, using these features, the CNN-DTW keyword spotter performs almost as well as the DTW keyword spotter while comfortably outperforming a baseline CNN trained only on the keyword templates. We conclude that a CNN-DTW keyword spotter using BNF-derived CAE features represents a computationally efficient approach with very competitive performance that is suited to rapid deployment in a severely under-resourced scenario.
机译:我们考虑有关可在资源不足的设置中可以应用的计算有效的特征学习的特征学习。目标是通过联合国(联合国)在非洲的部分地区支持人道主义救济计划,其中几乎没有语言资源。要允许在这种语言中快速开发的关键字发现系统,我们依赖于一组小而易于编译的孤立的关键字。使用孤立的关键字作为模板,我们将动态时间翘曲(DTW)应用于更大的域内,但未筛选的语音。由此产生的DTW对准分数用于训练卷积神经网络(CNN),该卷积神经网络(CNN)比DTW更高的数量级,因此适用于实时应用。我们通过识别在此几乎零资源设置中提供强大的性能的声学功能来优化此ASR无r个神经网络关键字发现过程。首先,我们考虑通过结合多语言瓶颈特征(BNF)提取器来整合来自资源但不相关的语言的信息的益处。接下来,我们考虑使用从域内培训的AutoEncoder(AE)中提取的功能但未筛选的数据。最后,我们考虑从对应AutoEncoder(CAE)获得的特征,该功能由AE初始化,随后在域的小型中标记数据上进行微调。南非英语和卢瓦顿的实验,一种低资源语言,展示了BNF和CAE特征,可以通过基线MFCC实现5%的相对性能改进。然而,通过使用BNF作为CAE的输入,即使实现了更好的性能,导致ROC区域下的MFCC在曲线(AUC)中的相对改善超过27%,并且超过两倍的前10个检索。我们还表明,使用这些功能,CNN-DTW关键字Spotter几乎和DTW关键字Spotter执行,同时舒适地优于仅在关键字模板上培训的基线CNN。我们得出结论,使用BNF衍生CAE特征的CNN-DTW关键字特点表示具有非常竞争力的性能的计算有效方法,这些方法适于在严重资源不足的情况下快速部署。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号