首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >SEGMENTAL AUDIO WORD2VEC: REPRESENTING UTTERANCES AS SEQUENCES OF VECTORS WITH APPLICATIONS IN SPOKEN TERM DETECTION
【24h】

SEGMENTAL AUDIO WORD2VEC: REPRESENTING UTTERANCES AS SEQUENCES OF VECTORS WITH APPLICATIONS IN SPOKEN TERM DETECTION

机译:分段音频Word2VEC:将话语代表为具有口语术语检测应用的载体序列

获取原文

摘要

While Word2Vec represents words (in text) as vectors carrying semantic information, audio Word2Vec was shown to be able to represent signal segments of spoken words as vectors carrying phonetic structure information. Audio Word2Vec can be trained in an unsupervised way from an unlabeled corpus, except the word boundaries are needed. In this paper, we extend audio Word2Vec from word-level to utterance-level by proposing a new segmental audio Word2Vec, in which unsupervised spoken word boundary segmentation and audio Word2Vec are jointly learned and mutually enhanced, so an utterance can be directly represented as a sequence of vectors carrying phonetic structure information. This is achieved by a segmental sequence-to-sequence autoencoder (SSAE), in which a segmentation gate trained with reinforcement learning is inserted in the encoder. Experiments on English, Czech, French and German show very good performance in both unsupervised spoken word segmentation and spoken term detection applications (significantly better than frame-based DTW).
机译:虽然Word2VEC表示作为携带语义信息的向量的单词(以文本为单位),但是显示音频Word2VEC作为携带语音结构信息的向量代表口语单词的信号段。除了需要字边界之外,音频Word2VEC可以以未标记的语料库中的无监督方式培训。在本文中,我们通过提出新的分段音频Word2VEC将音频Word2VEC从单词级扩展到话语级别,其中联合学习和相互增强了无监督的语言边界分割和音频Word2VEC,因此可以直接表示一个话语携带语音结构信息的载体序列。这是通过分段序列到序列的自动化器(SSAE)来实现的,其中在编码器中插入了用增强学学习培训的分段栅极。英语,捷克语,法国和德语的实验在无监督的口语分割和口语检测应用中表现出非常好的表现(明显优于基于框架的DTW)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号