首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >Fastpitch: Parallel Text-to-Speech with Pitch Prediction
【24h】

Fastpitch: Parallel Text-to-Speech with Pitch Prediction

机译:FastPitch:与音高预测的并行文本与语音

获取原文

摘要

We present FastPitch, a fully-parallel text-to-speech model based on FastSpeech, conditioned on fundamental frequency contours. The model predicts pitch contours during inference. By altering these predictions, the generated speech can be more expressive, better match the semantic of the utterance, and in the end more engaging to the listener. Uniformly increasing or decreasing pitch with FastPitch generates speech that resembles the voluntary modulation of voice. Conditioning on frequency contours improves the overall quality of synthesized speech, making it comparable to state-of-the-art. It does not introduce an overhead, and FastPitch retains the favorable, fully-parallel Transformer architecture, with over 900× real-time factor for mel-spectrogram synthesis of a typical utterance.
机译:我们呈现FastPitch,一个基于FastSeech的全平行文本到语音模型,调节基频轮廓。 该模型在推理期间预测音调轮廓。 通过改变这些预测,产生的语音可以更加表征,更好地匹配话语的语义,并且在最终中更加接合到听众。 用FastPitch统一增加或减少间距会产生类似于语音的自愿调制的语音。 频率轮廓上的调节提高了合成语音的整体质量,使其与最先进的言论相当。 它不会引入开销,FastPitch保留有利,完全平行的变压器架构,具有超过900倍的实时因素,用于典型话语的熔融谱图合成。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号