首页> 外文会议>Computer analysis of images and patterns. >Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness
【24h】

Minimized Database of Unit Selection in Visual Speech Synthesis without Loss of Naturalness

机译:视觉语音合成中单位选择的最小数据库,而不会失去自然性

获取原文
获取原文并翻译 | 示例

摘要

Image-based modeling is very successful in the creation of realistic facial animations. Applications with dialog systems, such as e-Learning and customer information service, can integrate facial animations with synthesized speech in websites to improve human-machine communication. However, downloading a database with 11,594 mouth images (about 120MB in JPEG format) used by talking head needs about 15 minutes at 150 kBps. This paper presents a prototype framework of two-step database minimization. First, the key mouth images are identified by clustering algorithms and similar mouth images are discarded. Second, the clustered key mouth images are further compressed by JPEG. MST (Minimum Spanning Tree), RSST (Recursive Shortest Spanning Tree) and LBG-based clustering algorithms are developed and evaluated. Our experiments demonstrate that the number of mouth images is lowered by the LBG-based clustering algorithm and further compressed to 8MB by JPEG, which generates facial animations in GIF format without loss of naturalness and fulfill the need of talking head for Internet applications.
机译:基于图像的建模在创建逼真的面部动画方面非常成功。带有对话系统的应用程序,例如电子学习和客户信息服务,可以将面部动画与合成语音集成到网站中,以改善人机通信。但是,下载包含11594个口头图像(JPEG格式的图像大约为120MB)的数据库时,说话人需要150 kBps的时间大约15分钟。本文提出了两步数据库最小化的原型框架。首先,通过聚类算法识别关键口图像,并丢弃相似的口图像。其次,通过JPEG进一步压缩了聚簇的按键图像。开发并评估了MST(最小生成树),RSST(递归最短生成树)和基于LBG的聚类算法。我们的实验表明,基于LBG的聚类算法可将口部图像的数量降低,并由JPEG压缩至8MB,从而生成GIF格式的面部动画,而不会损失自然度,并满足Internet应用中需要说话的人的需要。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号