首页> 外文学位 >Audio-Visual Asynchrony Modeling and Analysis for Speech Alignment and Recognition.

【24h】

Audio-Visual Asynchrony Modeling and Analysis for Speech Alignment and Recognition.

机译：语音对齐和识别的视听异步建模和分析。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

This work investigates perceived audio-visual asynchrony, specifically anticipatory coarticulation, in which the visual cues (e.g. lip rounding) of a speech sound may occur before the acoustic cues. This phenomenon often gives the impression that the visual and acoustic signals are asynchronous. This effect can be accounted for using models based on multiple hidden Markov models with some synchrony constraints linking states in different modalities, though generally only within phones and not across phone boundaries. In this work, we consider several such models, implemented as dynamic Bayesian networks (DBNs). We study the models' ability to accurately locate audio and viseme (audio and video sub-word units, respectively) boundaries in the audio and video signals, and compare them with human labels of these boundaries. This alignment task is important on its own for purposes of linguistic analysis, as it can serve as an analysis tool and a convenience tool to linguists. Furthermore, these advances in alignment systems can carry over into the speech recognition domain.;This thesis makes several contributions. First, this work presents a new set of manually labeled phonetic boundary data in words expected to display asynchrony, and analysis of the data confirms our expectations about this phenomenon. Second, this work presents a new software program called AVDDisplay which allows the viewing of audio, video, and alignment data simultaneously and in sync. This tool is essential for the alignment analysis detailed in this work. Third, new DBN-based models of audio-visual asynchrony are presented. The newly proposed models consider linguistic context within the asynchrony model. Fourth, alignment experiments are used to compare system performance with the hand-labeled ground truth. Finally, the performance of these models in a speech recognition context is examined. This work finds that the newly proposed models outperform previously suggested asynchrony models for both alignment and recognition tasks.

机译：这项工作研究了感知的视听异步，特别是预期的共鸣，其中语音的视觉提示（例如，嘴唇变圆）可能在听觉提示之前出现。这种现象通常给人以视觉和听觉信号是异步的印象。可以考虑使用基于多个隐马尔可夫模型的模型来解决此问题，该模型具有一些同步约束，这些约束以不同的方式链接状态，尽管通常仅在电话内而不是跨电话边界。在这项工作中，我们考虑了几种这样的模型，它们被实现为动态贝叶斯网络（DBN）。我们研究了模型在音频和视频信号中准确定位音频和视位（分别为音频和视频子词单元）边界的能力，并将其与这些边界的人类标签进行比较。就语言分析的目的而言，此对齐任务本身很重要，因为它可以充当语言学家的分析工具和便捷工具。此外，对齐系统的这些进步可以延续到语音识别领域。首先，这项工作提出了一组新的手动标记的语音边界数据，这些数据以预期会显示异步的单词表示，数据分析证实了我们对这一现象的期望。其次，这项工作提出了一个称为AVDDisplay的新软件程序，该程序允许同时同步显示音频，视频和对齐数据。该工具对于这项工作中详细介绍的比对分析至关重要。第三，提出了新的基于DBN的视听异步模型。新提出的模型考虑了异步模型中的语言环境。第四，使用对准实验将系统性能与手工标记的地面真实情况进行比较。最后，检查了这些模型在语音识别环境中的性能。这项工作发现，针对对齐和识别任务，新提出的模型优于先前提出的异步模型。

著录项

作者
Terry, Louis.;
展开▼
作者单位

Northwestern University.;

展开▼
授予单位 Northwestern University.;
学科 Speech Communication.;Engineering Electronics and Electrical.
学位 Ph.D.
年度 2011
页码 153 p.
总页数 153
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Audio-visual speech asynchrony detection using co-inertia analysis and coupled hidden markov models [J] . Enrique Argones Rua, Herve Bredin, Carmen Garcia Mateo, Pattern Analysis and Applications . 2009,第3期

机译：使用协惯性分析和耦合隐马尔可夫模型进行视听语音异步检测
2. Audio-visual speech experience with age influences perceived audio-visual asynchrony in speech [J] . Alm M., Behne D. The Journal of the Acoustical Society of America . 2013,第4aPta1期

机译：年龄的视听语音体验会影响语音中视听异步性
3. The effect of onset asynchrony in audio-visual speech and the Uncanny Valley in virtual characters [J] . Angela Tinwell, Mark Grimshaw, Deborah Abdel Nabi International journal of mechanisms and robotic systems . 2015,第2期

机译：视听语音中异步启动的影响以及虚拟角色中的“神奇谷”的影响
4. Audio-Visual Speech Asynchrony Modeling in a Talking Head [C] . Alexey Karpov, Liliya Tsirulnik, Zdenek Krnoul, International Speech Communication Association . 2009

机译：谈话头上的视听语音asynchrony建模
5. Robust speech processing based on microphone array, audio-visual, and frame selection for in-vehicle speech recognition and in-set speaker recognition. [D] . Zhang, Xianxian. 2005

机译：基于麦克风阵列，视听和帧选择的强大语音处理功能，可实现车载语音识别和内置说话人识别。
6. Audiovisual Asynchrony Detection and Speech Perception in Hearing-Impaired Listeners with Cochlear Implants: A Preliminary Analysis [O] . Marcia J. Hay-McCutcheon, David B. Pisoni, Kristopher K. Hunt -1

机译：听力障碍听觉与人工耳蜗植入的视听异步检测和语音知觉：初步分析。
7. Asynchrony Modeling for Audio-Visual Speech Recognition [O] . Guillaume Gravier, Gerasimos Potamianos, Chalapathy Neti 2002

机译：视听语音识别的异步建模

Audio-Visual Asynchrony Modeling and Analysis for Speech Alignment and Recognition.

摘要

著录项

相似文献

相关主题

期刊订阅