首页> 外文会议>Asia-Pacific Signal and Information Processing Association Annual Summit and Conference >Phoneme Embeddings on Predicting Fundamental Frequency Pattern for Electrolaryngeal Speech
【24h】

Phoneme Embeddings on Predicting Fundamental Frequency Pattern for Electrolaryngeal Speech

机译:Phoneme Embeddings预测电解致电基础频谱

获取原文

摘要

Electrolaryngeal (EL) speech has robotic quality owing to constant fundamental frequency (F0) patterns. In existing F0 pattern prediction frameworks, acoustic models are trained on spectral features of a large corpus of healthy speech. However, EL speech does not embed any useful information about F0 into spectrogram. Moreover, creating datasets with reasonably large number of EL utterances for training neural networks is very time-consuming. Hence, F0 prediction based on other features with sharing capability between EL and normal speech must be investigated. In this study, we investigate F0 prediction based on clustering of the phoneme embeddings. For a dataset consisting of utterances of both speech types, phoneme labels are extracted. These phoneme labels are then used to learn phoneme embeddings in a common 2-D space. Through clustering of the learned phoneme embeddings, new onehot features are created for F0 prediction. Experimental results show that when considering training sets consisting mixed utterances of EL and normal speech, by using new features, improvements in F0 prediction accuracy can be achieved. Moreover, accurate F0 patterns can be predicted even based on lower-dimensional features corresponding to small values for the number of clusters. This could simplify the structure of the recognition system required to extract phoneme labels from EL speech.
机译:电解(EL)言论由于恒定的基本频率而具有机器人质量(F. 0 模式。在现有F. 0 图案预测框架,声学模型培训了大型健康语音的光谱特征。但是,EL语音不会嵌入有关F的任何有用信息 0 进入谱图。此外,为培训神经网络的合理大量EL话语创建数据集是非常耗时的。因此,F. 0 必须研究基于具有EL和正常语音之间共享能力的其他特征的预测。在这项研究中,我们调查F. 0 基于音素嵌入的聚类预测。对于由两个语音类型的话语组成的数据集,提取了音素标签。然后,这些音素标签用于在共同的二维空间中学习音素嵌入。通过群集学习的音素嵌入式,为F创建新的Onehot功能 0 预言。实验结果表明,在考虑训练集时,通过使用新功能,通过使用新功能组成的EL和正常演讲组成的混合话语 0 可以实现预测准确性。而且,精确的f 0 甚至可以基于对应于簇数的小值的低维特征来预测模式。这可以简化从EL语音中提取音素标签所需的识别系统的结构。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号