...
首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >End-to-End ASR-Free Keyword Search From Speech
【24h】

End-to-End ASR-Free Keyword Search From Speech

机译:语音的端到端无ASR关键字搜索

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Conventional keyword search (KWS) systems for speech databases match the input text query to the set of word hypotheses generated by an automatic speech recognition (ASR) system from utterances in the database. Hence, such KWS systems attempt to solve the complex problem of ASR as a precursor. Training an ASR system itself is a time-consuming process requiring transcribed speech data. Our prior work presented an ASR-free end-to-end system that needed minimal supervision and trained significantly faster than an ASR-based KWS system. The ASR-free KWS system consisted of three subsystems. The first subsystem was a recurrent neural network based acoustic encoder that extracted a finite-dimensional embedding of the speech utterance. The second subsystem was a query encoder that produced an embedding of the input text query. The acoustic and query embeddings were input to a feedforward neural network that predicted whether the query occurred in the acoustic utterance or not. This paper extends our prior work in several ways. First, we significantly improve upon our previous ASR-free KWS results by nearly 20% relative through improvements to the acoustic encoder. Next, we show that it is possible to train the acoustic encoder on languages other than the language of interest with only a small drop in KWS performance. Finally, we attempt to predict the location of the detected keywords by training a location-sensitive KWS network.
机译:用于语音数据库的常规关键字搜索(KWS)系统将输入文本查询与自动语音识别(ASR)系统根据数据库中的语音生成的单词假设集进行匹配。因此,这样的KWS系统试图解决作为前体的ASR的复杂问题。培训ASR系统本身是一个耗时的过程,需要转录的语音数据。我们以前的工作提出了一种无ASR的端到端系统,与基于ASR的KWS系统相比,该系统需要最少的监督,并且培训得更快。无ASR的KWS系统由三个子系统组成。第一个子系统是基于递归神经网络的声学编码器,该编码器提取了语音发声的有限维嵌入。第二个子系统是查询编码器,它产生了输入文本查询的嵌入。声音和查询嵌入被输入到前馈神经网络,该神经网络预测查询是否以声音发声发生。本文以几种方式扩展了我们的现有工作。首先,通过改进声学编码器,相对于之前的无ASR的KWS结果,我们的性能明显提高了近20%。接下来,我们证明可以在感兴趣的语言之外的其他语言上训练声学编码器,而KWS性能只有很小的下降。最后,我们尝试通过训练位置敏感的KWS网络来预测检测到的关键字的位置。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号