A HMM-based Mandarin Chinese Singing Voice Synthesis System

Xian Li; Zengfu Wang

摘要

We propose a mandarin Chinese singing voice synthesis system, in which hidden Markov model(HMM)-based speech synthesis technique is used. A mandarin Chinese singing voice corpus is recorded and musical contextual features are well designed for training. F0 and spectrum of singing voice are simultaneously modeled with context-dependent HMMs. There is a new problem, F0 of singing voice is always sparse because of large amount of context, i.e., tempo and pitch of note, key, time signature and etc. So the features hardly ever appeared in the training data cannot be well obtained. To address this problem,difference between F0 of singing voice and that of musical score(DF0) is modeled by a single Viterbi training. To overcome the over-smoothing of the generated F0 contour, syllable level F0 model based on discrete cosine transforms(DCT) is applied, F0 contour is generated by integrating two-level statistical models.The experimental results demonstrate that the proposed system outperforms the baseline system in both objective and subjective evaluations. The proposed system can generate a more natural F0 contour. Furthermore, the syllable level F0 model can make singing voice more expressive.

机译：我们提出了一个普通话中文歌声合成系统，其中使用了基于隐马尔可夫模型（HMM）的语音合成技术。录制普通话中文歌声语料库，并精心设计音乐上下文特征以进行训练。使用上下文相关的HMM同时模拟F0和歌声频谱。这是一个新问题，由于大量的上下文，即音符的速度和音调，键，拍号等，演唱声音的F0总是很稀疏。因此，很难很好地获得训练数据中从未出现过的特征。为了解决这个问题，通过一次维特比训练来模拟演唱声的F0和乐谱的DF0之间的差异。为了克服生成的F0轮廓的过度平滑问题，应用了基于离散余弦变换（DCT）的音节级F0模型，通过整合两级统计模型生成了F0轮廓。实验结果表明，所提出的系统优于基线主观和主观评价体系。提出的系统可以生成更自然的F0轮廓。此外，音节等级F0模型可以使歌声更加富有表现力。

A HMM-based Mandarin Chinese Singing Voice Synthesis System

摘要

著录项

相关主题

期刊订阅