首页> 外文期刊>Audio, Speech, and Language Processing, IEEE/ACM Transactions on >Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings
【24h】

Unsupervised Word Segmentation and Lexicon Discovery Using Acoustic Word Embeddings

机译:使用声词嵌入的无监督分词和词典发现

获取原文
获取原文并翻译 | 示例

摘要

In settings where only unlabeled speech data is available, speech technology needs to be developed without transcriptions, pronunciation dictionaries, or language modelling text. A similar problem is faced when modeling infant language acquisition. In these cases, categorical linguistic structure needs to be discovered directly from speech audio. We present a novel unsupervised Bayesian model that segments unlabeled speech and clusters the segments into hypothesized word groupings. The result is a complete unsupervised tokenization of the input speech in terms of discovered word types. In our approach, a potential word segment (of arbitrary length) is embedded in a fixed-dimensional acoustic vector space. The model, implemented as a Gibbs sampler, then builds a whole-word acoustic model in this space while jointly performing segmentation. We report word error rates in a small-vocabulary connected digit recognition task by mapping the unsupervised decoded output to ground truth transcriptions. The model achieves around 20% error rate, outperforming a previous HMM-based system by about 10% absolute. Moreover, in contrast to the baseline, our model does not require a pre-specified vocabulary size.
机译:在只有无标签语音数据可用的环境中,语音技术需要开发而无需转录,发音词典或语言建模文本。在对婴儿语言习得进行建模时,也会遇到类似的问题。在这些情况下,需要直接从语音音频中发现分类语言结构。我们提出了一种新颖的无监督贝叶斯模型,该模型可分割未标记的语音,并将这些段聚类为假设的单词分组。结果是根据发现的单词类型对输入语音进行了完全无监督的标记化。在我们的方法中,潜在的词段(任意长度)被嵌入到固定尺寸的声学向量空间中。该模型以Gibbs采样器的形式实现,然后在联合执行分割的同时在该空间中建立了一个全字声学模型。通过将无监督的解码输出映射到地面真相转录,我们报告了小词汇连接数字识别任务中的单词错误率。该模型的错误率约为20%,绝对值比以前的基于HMM的系统高出约10%。此外,与基准相比,我们的模型不需要预先指定的词汇量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号