Creating Song From Lip and Tongue Videos With a Convolutional Vocoder

Jianyu Zhang; Pierre Roussel; Bruce Denby

首页> 外文期刊>Quality Control, Transactions >Creating Song From Lip and Tongue Videos With a Convolutional Vocoder

【24h】

Creating Song From Lip and Tongue Videos With a Convolutional Vocoder

机译：用卷积探索器创建歌曲和舌头视频的歌曲

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

A convolutional neural network and deep autoencoder are used to predict Line Spectral Frequencies, F0, and a voiced/unvoiced flag in singing data, using as input only ultrasound images of the tongue and visual images of the lips. A novel convolutional vocoder to transform the learned parameters into an audio signal is also presented. Spectral Distortion of predicted Line Spectral Frequencies is reduced compared to that in an earlier study using handcrafted features and multilayer perceptrons on the same data set; while predicted F0 and voiced/unvoiced flag predictions are found to be highly correlated with their ground truth values. Comparison of the convolutional vocoder to standard vocoders is made. Results can be of interest in the study of singing articulation as well as for silent speech interface research. Sample predicted audio files are available online. Source code: https://github.com/TjuJianyu/SSI_DL .

机译：卷积神经网络和深度自动频率用于预测唱现数据中的线谱频率，F0和浊音/发音标志，用作嘴唇的舌头和视觉图像的超声图像。还提出了一种新颖的卷积声，以将学习参数转换为音频信号。与在同一数据集上的手工特征和多层的Perceptrons中的早期研究中，预测线频谱频率的光谱失真减少;在预测的F0和浊音/发声标志预测中发现与其基础值高度相关。制作了卷积到标准声码的比较。结果可能对唱歌铰接的研究以及沉默的语音界面研究感兴趣。样本预测音频文件可在线获取。源代码： https：// github.com/tjujianyu/ssi_dl 。

著录项

来源
《Quality Control, Transactions》 |2021年第1期|13076-13082|共7页
作者
Jianyu Zhang; Pierre Roussel; Bruce Denby;
展开▼
作者单位

Institut Langevin (ESPCI Paris PSL University CNRS Sorbonne Université) Paris France;

Institut Langevin (ESPCI Paris PSL University CNRS Sorbonne Université) Paris France;

Institut Langevin (ESPCI Paris PSL University CNRS Sorbonne Université) Paris France;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Vocoders; Lips; Tongue; Ultrasonic imaging; Training; Acoustics; Speech recognition;

机译：声码;嘴唇;舌头;超声成像;培训;声学;语音识别;

相似文献

外文文献
中文文献
专利

1. Creating Slow Motion Videos from Standard Footage Using Convolutional Neural Networks [J] . Inside R & D . 2018,第JULa20期

机译：使用卷积神经网络从标准镜头创建慢动作视频
2. Reconstruction of an upper lip and intraoral defect following resection of an upper lip melanoma using a lower lip musculomucosal flap combined with a tongue flap [J] . Yukio Yoshioka, Yasutaka Hayashido, Yoku Ito, Journal of Surgical Case Reports . 2020,第4期

机译：使用下唇肌肉素瓣与舌片结合切除上唇黑色素瘤后的上唇和内缺陷后的重建
3. . Tongue-lip adhesion and tongue repositioning for obstructive sleep apnoea in Pierre Robin sequence: A systematic review and meta-analysis [J] . Camacho M., Noller M. W., Zaghi S., The Journal of laryngology and otology. . 2017,第5期

机译：。 Pierre Robin序列中阻塞性睡眠呼吸暂停的舌唇粘附和舌头重新定位：系统评价和荟萃分析
4. Towards a Segmental Vocoder Driven by Ultrasound and Optical Images of theTongue and Lips [C] . Thomas Hueber, Gerard Chollet, Bruce Denby, International Speech Communication Association . 2008

机译：朝着由超声波和嘴唇的超声波和光学图像驱动的分段声探剂
5. Teeth, tongue, lips, jaw. [D] . Leiblic, Noelle Marie. 2005

机译：牙齿，舌头，嘴唇，下巴。
6. Reconstruction of an upper lip and intraoral defect following resection of an upper lip melanoma using a lower lip musculomucosal flap combined with a tongue flap [O] . Yukio Yoshioka, Yasutaka Hayashido, Yoku Ito, 2020

机译：使用下唇肌粘膜皮瓣与舌瓣联合切除上唇黑色素瘤后修复上唇和口腔内缺损
7. Creating Song From Lip and Tongue Videos With a Convolutional Vocoder [O] . Jianyu Zhang, Pierre Roussel, Bruce Denby 2021

机译：用卷积探索器创建歌曲和舌头视频的歌曲

Creating Song From Lip and Tongue Videos With a Convolutional Vocoder

摘要

著录项

相似文献

相关主题

期刊订阅