SEGMENTAL AUDIO WORD2VEC: REPRESENTING UTTERANCES AS SEQUENCES OF VECTORS WITH APPLICATIONS IN SPOKEN TERM DETECTION

机译：分段音频Word2VEC：将话语代表为具有口语术语检测应用的载体序列

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

While Word2Vec represents words (in text) as vectors carrying semantic information, audio Word2Vec was shown to be able to represent signal segments of spoken words as vectors carrying phonetic structure information. Audio Word2Vec can be trained in an unsupervised way from an unlabeled corpus, except the word boundaries are needed. In this paper, we extend audio Word2Vec from word-level to utterance-level by proposing a new segmental audio Word2Vec, in which unsupervised spoken word boundary segmentation and audio Word2Vec are jointly learned and mutually enhanced, so an utterance can be directly represented as a sequence of vectors carrying phonetic structure information. This is achieved by a segmental sequence-to-sequence autoencoder (SSAE), in which a segmentation gate trained with reinforcement learning is inserted in the encoder. Experiments on English, Czech, French and German show very good performance in both unsupervised spoken word segmentation and spoken term detection applications (significantly better than frame-based DTW).

机译：虽然Word2VEC表示作为携带语义信息的向量的单词（以文本为单位），但是显示音频Word2VEC作为携带语音结构信息的向量代表口语单词的信号段。除了需要字边界之外，音频Word2VEC可以以未标记的语料库中的无监督方式培训。在本文中，我们通过提出新的分段音频Word2VEC将音频Word2VEC从单词级扩展到话语级别，其中联合学习和相互增强了无监督的语言边界分割和音频Word2VEC，因此可以直接表示一个话语携带语音结构信息的载体序列。这是通过分段序列到序列的自动化器（SSAE）来实现的，其中在编码器中插入了用增强学学习培训的分段栅极。英语，捷克语，法国和德语的实验在无监督的口语分割和口语检测应用中表现出非常好的表现（明显优于基于框架的DTW）。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2018年|5739-6377p|共5页
会议地点
作者
Yu-Hsuan Wang; Hung-yi Lee; Lin-shan Lee;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
recurrent neural network; autoencoder; reinforcement learning; policy gradient;

机译：经常性神经网络;AutoEncoder;加强学习;政策梯度;

相似文献

外文文献
中文文献
专利

1. Detection of misunderstandings in spoken dialogue system using system-user utterance sequence [J] . Hirasawa Jun-ichi, Miyazaki Noboru, Aikawa Kiyoaki 電子情報通信学会技術研究報告. 音声. Speech . 2000,第523期

机译：使用系统用户话语序列检测口语对话系统中的误解
2. Detection of misunderstandings in spoken dialogue system using system-user utterance sequence [J] . Hirasawa Jun-ichi, Miyazaki Noboru, Aikawa Kiyoaki 電子情報通信学会技術研究報告. 言語理解とコミュニケーション. Natural Language Understanding and Models of Communication . 2000,第521期

机译：使用系统用户话语序列检测口语对话系统中的误解
3. Detection of misunderstandings in spoken dialogue system using system-user utterance sequence [J] . Hirasawa Jun-ichi, Miyazaki Noboru, Aikawa Kiyoaki 電子情報通信学会技術研究報告. 言語理解とコミュニケーション. Natural Language Understanding and Models of Communication . 2000,第521期

机译：使用系统用户话语序列检测口语对话系统中的误解
4. SEGMENTAL AUDIO WORD2VEC: REPRESENTING UTTERANCES AS SEQUENCES OF VECTORS WITH APPLICATIONS IN SPOKEN TERM DETECTION [C] . Yu-Hsuan Wang, Hung-yi Lee, Lin-shan Lee IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：分段音频Word2VEC：将话语代表为具有口语术语检测应用的载体序列
5. Discriminative Articulatory Feature-based Pronunciation Models with Application to Spoken Term Detection [D] . Prabhavalkar, Rohit. 2013

机译：基于区分性发音特征的语音模型及其在口语检测中的应用
6. Using Complexity-Identical Human- and Machine-Directed Utterances to Investigate Addressee Detection for Spoken Dialogue Systems [O] . Oleg Akhtiamov, Ingo Siegert, Alexey Karpov, 2020

机译：使用复杂度相同的人机对话来调查口语对话系统的收件人检测
7. Segmental Audio Word2Vec: Representing Utterances as Sequences of Vectors with Applications in Spoken Term Detection [O] . Yu-Hsuan Wang, Hung-Yi Lee, Lin-Shan Lee 2018

机译：分段音频Word2VEC：将话语代表为具有口语术语检测应用的载体序列

SEGMENTAL AUDIO WORD2VEC: REPRESENTING UTTERANCES AS SEQUENCES OF VECTORS WITH APPLICATIONS IN SPOKEN TERM DETECTION

摘要

著录项

相似文献

相关主题

期刊订阅