首页> 外文学位 >A framework for automatic creation of talking heads for multimedia applications.
【24h】

A framework for automatic creation of talking heads for multimedia applications.

机译:自动创建多媒体应用程序的讲话头的框架。

获取原文
获取原文并翻译 | 示例

摘要

In this dissertation, a framework for automatic creation of talking heads for various multimedia applications is presented. In this framework, we present a new audio-to-visual conversion algorithm that uses a constrained optimization approach to take advantage of the dynamics of mouth movements. Based on facial muscle analysis, the dynamics of mouth movements is modeled and constraints are obtained from it. The obtained constraints are used to estimate visual parameters from speech in a framework of HMM-based visual parameter estimation. The proposed constrained optimization approach finds visual parameters that satisfy given constraints and maximize the auxiliary function that used to train audio-visual HMMs. This approach enables the algorithm to produce reliable visual parameters even in noisy environments. Experimental results demonstrate that the proposed audio-to-visual conversion method is able to follow true visual parameters robustly in various noisy environments. In addition to the constrained optimization approach for robust audio-to-visual conversion, an automatic scheme to create a 3D head model is presented. In this scheme a probabilistic approach, to decide whether or not extracted facial features are appropriate for creating a 3D face model, is presented. Automatically extracted 2D facial features from a video sequence are fed into the proposed probabilistic framework before a corresponding 3D face model is built to avoid generating an unnatural or non-realistic 3D face model. We also present a face shape extractor, based on an ellipse model controlled by three anchor points, which is accurate and computationally cheap. To create a 3D face model, a least-square approach is used to find a coefficient vector that is necessary to adapt a generic 3D model into extracted facial features. Experimental results show that the proposed scheme can efficiently build a 3D face model from a video sequence without any user intervention for various Internet applications including virtual conference and a virtual story teller that do not require much head movements or high quality facial animation.
机译:本文提出了一种为各种多媒体应用自动创建通话头的框架。在此框架中,我们提出了一种新的视听转换算法,该算法使用约束优化方法来利用嘴巴运动的动态。基于面部肌肉分析,对嘴部运动的动力学进行建模,并从中获得约束。在基于HMM的视觉参数估计框架中,将获得的约束用于从语音估计视觉参数。所提出的约束优化方法找到满足给定约束的视觉参数,并最大化用于训练视听HMM的辅助功能。这种方法使算法即使在嘈杂的环境中也能产生可靠的视觉参数。实验结果表明,所提出的视听转换方法能够在各种嘈杂环境中稳健地遵循真实的视觉参数。除了用于稳健的视听转换的约束优化方法之外,还提出了一种自动方案来创建3D头部模型。在该方案中,提出了一种概率方法,用于确定提取的面部特征是否适合于创建3D面部模型。从视频序列中自动提取的2D面部特征将被馈送到建议的概率框架中,然后建立相应的3D面部模型以避免生成不自然或不真实的3D面部模型。我们还提出了一种基于由三个锚点控制的椭圆模型的面部形状提取器,该提取器准确且计算便宜。为了创建3D人脸模型,使用最小二乘法来找到系数向量,该系数向量对于将通用3D模型适配到提取的人脸特征中是必需的。实验结果表明,该方案可以有效地从视频序列中构建3D人脸模型,而无需用户干预,无需各种头部动作或高质量面部动画的各种Internet应用,包括虚拟会议和虚拟故事讲述人。

著录项

  • 作者

    Choi, KyoungHo.;

  • 作者单位

    University of Washington.;

  • 授予单位 University of Washington.;
  • 学科 Engineering Electronics and Electrical.
  • 学位 Ph.D.
  • 年度 2002
  • 页码 95 p.
  • 总页数 95
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 无线电电子学、电信技术;
  • 关键词

  • 入库时间 2022-08-17 11:46:39

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号