Phoneme Embeddings on Predicting Fundamental Frequency Pattern for Electrolaryngeal Speech

机译：Phoneme Embeddings预测电解致电基础频谱

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Electrolaryngeal (EL) speech has robotic quality owing to constant fundamental frequency (F0) patterns. In existing F0 pattern prediction frameworks, acoustic models are trained on spectral features of a large corpus of healthy speech. However, EL speech does not embed any useful information about F0 into spectrogram. Moreover, creating datasets with reasonably large number of EL utterances for training neural networks is very time-consuming. Hence, F0 prediction based on other features with sharing capability between EL and normal speech must be investigated. In this study, we investigate F0 prediction based on clustering of the phoneme embeddings. For a dataset consisting of utterances of both speech types, phoneme labels are extracted. These phoneme labels are then used to learn phoneme embeddings in a common 2-D space. Through clustering of the learned phoneme embeddings, new onehot features are created for F0 prediction. Experimental results show that when considering training sets consisting mixed utterances of EL and normal speech, by using new features, improvements in F0 prediction accuracy can be achieved. Moreover, accurate F0 patterns can be predicted even based on lower-dimensional features corresponding to small values for the number of clusters. This could simplify the structure of the recognition system required to extract phoneme labels from EL speech.

机译：电解（EL）言论由于恒定的基本频率而具有机器人质量（F. 0 模式。在现有F. 0 图案预测框架，声学模型培训了大型健康语音的光谱特征。但是，EL语音不会嵌入有关F的任何有用信息 0 进入谱图。此外，为培训神经网络的合理大量EL话语创建数据集是非常耗时的。因此，F. 0 必须研究基于具有EL和正常语音之间共享能力的其他特征的预测。在这项研究中，我们调查F. 0 基于音素嵌入的聚类预测。对于由两个语音类型的话语组成的数据集，提取了音素标签。然后，这些音素标签用于在共同的二维空间中学习音素嵌入。通过群集学习的音素嵌入式，为F创建新的Onehot功能 0 预言。实验结果表明，在考虑训练集时，通过使用新功能，通过使用新功能组成的EL和正常演讲组成的混合话语 0 可以实现预测准确性。而且，精确的f 0 甚至可以基于对应于簇数的小值的低维特征来预测模式。这可以简化从EL语音中提取音素标签所需的识别系统的结构。

著录项

来源
《Asia-Pacific Signal and Information Processing Association Annual Summit and Conference》|2020年|572-577|共6页
会议地点
作者
Mohammad Eshghi; Kazuhiro Kobayashi; Kou Tanaka; Hirokazu Kameoka; Tomoki Toda;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Feature extraction; Training; Speech recognition; Speech enhancement; Speech coding; Training data; Predictive models;

机译：特征提取;培训;语音识别;语音增强;语音编码;培训数据;预测模型;

相似文献

外文文献
中文文献
专利

1. Speech tempo and fundamental frequency patterns; A case study of male monozygotic twins and an age- and sex-matched sibling [J] . SANDRA P. WHITESIDE, EMMA RIXON Logopedics, phoniatrics, vocology. . 2014,第3a4期

机译：语音节奏和基本频率模式;男性单卵双胞胎以及年龄和性别匹配的同胞的个案研究
2. Speech tempo and fundamental frequency patterns; A case study of male monozygotic twins and an age- and sex-matched sibling [J] . SANDRA P. WHITESIDE, EMMA RIXON Logopedics, phoniatrics, vocology. . 2013,第3a4期

机译：语音节奏和基础频率模式; 雄性单卵双胞胎和性别匹配兄弟姐妹的案例研究
3. Phoneme recognition using zerocrossing interval distribution of speech patterns and ANN [J] . R.K. Sunil Kumar, V.L. Lajish International journal of speech technology . 2013,第1期

机译：使用语音模式和神经网络的零交叉间隔分布的音素识别
4. An Investigation of Fundamental Frequency Pattern Prediction in Electrolaryngeal Speech Enhancement [C] . Mohammad Eshghi, Kou Tanaka, Kazuhiro Kobayashi, 日本音響学会研究発表会 . 2019

机译：电解语音增强基本频率模式预测的研究
5. Investigating the patterns of text-to-speech software use by adolescent struggling readers: An embedded multiple case study. [D] . Takahashi, Kiriko. 2015

机译：研究陷入困境的青少年阅读器使用的文本到语音软件的模式：嵌入式多案例研究。
6. Classifying acoustic signals into phoneme categories: average and dyslexic readers make use of complex dynamical patterns and multifractal scaling properties of the speech signal [O] . Fred Hasselman -1

机译：将声音信号分为音素类别：普通和阅读困难的读者利用语音信号的复杂动态模式和多重分形缩放特性
7. Influence of different phoneme mappings on the recognition accuracy of electrolaryngeal speech [O] . Stanislav, Petr 2012

机译：不同音素映射对电喉语音识别精度的影响

Phoneme Embeddings on Predicting Fundamental Frequency Pattern for Electrolaryngeal Speech

摘要

著录项

相似文献

相关主题

期刊订阅