End-to-End ASR-Free Keyword Search From Speech

Kartik Audhkhasi; Andrew Rosenberg; Abhinav Sethy; Bhuvana Ramabhadran; Brian Kingsbury

首页> 外文期刊>Selected Topics in Signal Processing, IEEE Journal of >End-to-End ASR-Free Keyword Search From Speech

【24h】

End-to-End ASR-Free Keyword Search From Speech

机译：语音的端到端无ASR关键字搜索

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Conventional keyword search (KWS) systems for speech databases match the input text query to the set of word hypotheses generated by an automatic speech recognition (ASR) system from utterances in the database. Hence, such KWS systems attempt to solve the complex problem of ASR as a precursor. Training an ASR system itself is a time-consuming process requiring transcribed speech data. Our prior work presented an ASR-free end-to-end system that needed minimal supervision and trained significantly faster than an ASR-based KWS system. The ASR-free KWS system consisted of three subsystems. The first subsystem was a recurrent neural network based acoustic encoder that extracted a finite-dimensional embedding of the speech utterance. The second subsystem was a query encoder that produced an embedding of the input text query. The acoustic and query embeddings were input to a feedforward neural network that predicted whether the query occurred in the acoustic utterance or not. This paper extends our prior work in several ways. First, we significantly improve upon our previous ASR-free KWS results by nearly 20% relative through improvements to the acoustic encoder. Next, we show that it is possible to train the acoustic encoder on languages other than the language of interest with only a small drop in KWS performance. Finally, we attempt to predict the location of the detected keywords by training a location-sensitive KWS network.

机译：用于语音数据库的常规关键字搜索（KWS）系统将输入文本查询与自动语音识别（ASR）系统根据数据库中的语音生成的单词假设集进行匹配。因此，这样的KWS系统试图解决作为前体的ASR的复杂问题。培训ASR系统本身是一个耗时的过程，需要转录的语音数据。我们以前的工作提出了一种无ASR的端到端系统，与基于ASR的KWS系统相比，该系统需要最少的监督，并且培训得更快。无ASR的KWS系统由三个子系统组成。第一个子系统是基于递归神经网络的声学编码器，该编码器提取了语音发声的有限维嵌入。第二个子系统是查询编码器，它产生了输入文本查询的嵌入。声音和查询嵌入被输入到前馈神经网络，该神经网络预测查询是否以声音发声发生。本文以几种方式扩展了我们的现有工作。首先，通过改进声学编码器，相对于之前的无ASR的KWS结果，我们的性能明显提高了近20％。接下来，我们证明可以在感兴趣的语言之外的其他语言上训练声学编码器，而KWS性能只有很小的下降。最后，我们尝试通过训练位置敏感的KWS网络来预测检测到的关键字的位置。

著录项

来源
《Selected Topics in Signal Processing, IEEE Journal of》 |2017年第8期|1351-1359|共9页
作者
Kartik Audhkhasi; Andrew Rosenberg; Abhinav Sethy; Bhuvana Ramabhadran; Brian Kingsbury;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Neural networks; Automatic speech recognition; Training; Hidden Markov models; Speech processing; Recurrent neural networks; Keyword search;

机译：神经网络;自动语音识别;训练;隐马尔可夫模型;语音处理;递归神经网络;关键词搜索;

相似文献

外文文献
中文文献
专利

1. Multi-keyword spotting of telephone speech using a fuzzy search algorithm and keyword-driven two-level CBSM [J] . Chung-Hsien Wu, Yeou-Jiunn Chen Speech Communication . 2001,第3期

机译：使用模糊搜索算法和关键字驱动的两级CBSM的电话语音多关键字识别
2. Feature learning for efficient ASR-free keyword spotting in low-resource languages [J] . Ewald van der Westhuizen, Herman Kamper, Raghav Menon, Computer speech and language . 2022,第Jana期

机译：特征学习以低资源语言的高效无论是无ASR的关键字拍摄
3. Cross-language phoneme mapping for phonetic search keyword spotting in continuous speech of under-resourced languages [J] . Ella Tetariy, Yossi Bar-Yosef, Vered Silber-Varod, Artificial Intelligence Research . 2015,第2期

机译：跨语言音素映射，用于在资源不足的语言的连续语音中发现语音搜索关键词
4. End-to-end ASR-free keyword search from speech [C] . Kartik Audhkhasi, Andrew Rosenberg, Abhinav Sethy, IEEE International Conference on Acoustics, Speech and Signal Processing . 2017

机译：语音的端到端无ASR关键字搜索
5. Improving Keywords Spotting Performance in Noise with Augmented Dataset from Vocoded Speech and Speech Denoising [D] . Li, Ruohao. 2021

机译：从声音语音和语音去噪带来的噪声中的噪声中的关键字
6. A data-sharing scheme that supports multi-keyword search for electronic medical records [O] . Shufen Niu, Wenke Liu, Song Han, 2021

机译：一种支持电子医疗记录的多关键字搜索的数据共享方案
7. End-to-End ASR-free Keyword Search from Speech [O] . Audhkhasi, Kartik, Rosenberg, Andrew, Sethy, Abhinav, 2017

机译：语音端到端无asR关键字搜索

End-to-End ASR-Free Keyword Search From Speech

摘要

著录项

相似文献

相关主题

期刊订阅