首页> 外文期刊>Quality Control, Transactions >Creating Song From Lip and Tongue Videos With a Convolutional Vocoder
【24h】

Creating Song From Lip and Tongue Videos With a Convolutional Vocoder

机译:用卷积探索器创建歌曲和舌头视频的歌曲

获取原文
获取原文并翻译 | 示例
           

摘要

A convolutional neural network and deep autoencoder are used to predict Line Spectral Frequencies, F0, and a voiced/unvoiced flag in singing data, using as input only ultrasound images of the tongue and visual images of the lips. A novel convolutional vocoder to transform the learned parameters into an audio signal is also presented. Spectral Distortion of predicted Line Spectral Frequencies is reduced compared to that in an earlier study using handcrafted features and multilayer perceptrons on the same data set; while predicted F0 and voiced/unvoiced flag predictions are found to be highly correlated with their ground truth values. Comparison of the convolutional vocoder to standard vocoders is made. Results can be of interest in the study of singing articulation as well as for silent speech interface research. Sample predicted audio files are available online. Source code: https://github.com/TjuJianyu/SSI_DL .
机译:卷积神经网络和深度自动频率用于预测唱现数据中的线谱频率,F0和浊音/发音标志,用作嘴唇的舌头和视觉图像的超声图像。还提出了一种新颖的卷积声,以将学习参数转换为音频信号。与在同一数据集上的手工特征和多层的Perceptrons中的早期研究中,预测线频谱频率的光谱失真减少;在预测的F0和浊音/发声标志预测中发现与其基础值高度相关。制作了卷积到标准声码的比较。结果可能对唱歌铰接的研究以及沉默的语音界面研究感兴趣。样本预测音频文件可在线获取。源代码: https:// github.com/tjujianyu/ssi_dl

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号