首页> 外文会议>Joint IEEE International Conference on Development and Learning and Epigenetic Robotics >Comparative study of feature extraction methods for direct word discovery with NPB-DAA from natural speech signals
【24h】

Comparative study of feature extraction methods for direct word discovery with NPB-DAA from natural speech signals

机译:利用NPB-DAA从自然语音信号中直接发现单词的特征提取方法的比较研究

获取原文

摘要

Human infants can discover words directly from unsegmented speech signals given by their mothers and other people without any explicitly labeled data. Developing a computational model and a machine learning method that enable an artificial system to acquire words and phonemes from speech signals automatically is an important challenge. It also provides a hypothesis that can explain the dynamic process performed by infants, i.e., word discovery and phoneme acquisition from daily experiences. The nonparametric Bayesian double articulation analyzer (NPB-DAA) is an unsupervised machine learning method that can automatically discover word-like and phoneme-like units from speech signals directly. However, its performance has only not been evaluated using natural spoken languages including consonants. For dealing with natural speech signals including consonants, a comparative study of the methods for extracting features from speech signals is crucially important. This paper provides a comparative study of feature extraction methods for direct word discovery with NPB-DAA from natural speech signals. We examined six types of feature extraction methods employing a mel-frequency cepstral coefficient and a deep sparse autoencoder (DSAE) with several types of employment of dynamic features on the TIDIGITS corpus, which contains utterances of connected digit sequences. The results showed that 1) NPB-DAA with/without DSAE can extract words and phonemes from natural speech signals containing consonants to a certain extent, 2) naive introduction of dynamics features can even harm the performance of word discovery, and 3) DSAE can consistently increase the correlation between the log-likelihood and the performance measure of word discovery.
机译:人类婴儿可以直接从母亲和其他人发出的未分段语音信号中发现单词,而无需任何明确标记的数据。开发使人工系统能够从语音信号中自动获取单词和音素的计算模型和机器学习方法是一项重要的挑战。它还提供了一个假设,可以解释婴儿执行的动态过程,即从日常经历中发现单词和获取音素。非参数贝叶斯双发音分析器(NPB-DAA)是一种无监督的机器学习方法,可以直接从语音信号中自动发现类似单词和类似音素的单元。但是,仅使用包括辅音的自然口语无法评估其性能。为了处理包括辅音的自然语音信号,从语音信号中提取特征的方法的比较研究至关重要。本文对使用NPB-DAA从自然语音信号中直接发现单词的特征提取方法进行了比较研究。我们研究了六种使用梅尔频率倒谱系数和深稀疏自动编码器(DSAE)的特征提取方法,并在TIDIGITS语料库上使用了几种动态特征,其中包含相连数字序列的发音。结果表明:1)带有/不带有DSAE的NPB-DAA可以在一定程度上从包含辅音的自然语音信号中提取单词和音素; 2)天真地引入动态功能甚至会损害单词发现的性能; 3)DSAE可以不断提高对数似然性与单词发现性能度量之间的相关性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号