首页> 外文学位 >Kinematic measurement and feature sets for automatic speech recognition.
【24h】

Kinematic measurement and feature sets for automatic speech recognition.

机译:运动学测量和功能集,用于自动语音识别。

获取原文
获取原文并翻译 | 示例

摘要

This thesis examines the use of measured and inferred kinematic information in automatic speech recognition and lipreading, and investigates the relative information content and recognition performance of vowels and consonants. The kinematic information describes the motions of the organs of speech—the articulators. The contributions of this thesis include a new device and set of algorithms for lipreading (their design, construction, implementation, and testing); incorporation of direct articulator-position measurements into a speech recognizer; and reevaluation of some assumptions regarding vowels and consonants.; The motivation for including articulatory information is to improve modeling of coarticulation and reconcile multiple input modalities for lipreading. Coarticulation, a ubiquitous phenomenon, is the process by which speech sounds are modified by preceding and following sounds.; To be useful in practice, a recognizer will have to infer articulatory information from sound, video, or both. Previous work made progress towards recovery of articulation from sound. The present project assumes that such recovery is possible; it examines the advantage of joint acoustic-articulatory representations over acoustic-only. Also reported is an approach to recovery from video in which camera placement (side view, head-mounted) and lighting are chosen to robustly obtain lip-motion information.; Joint acoustic-articulatory recognition experiments were performed using the University of Wisconsin X-ray Microbeam Speech Production Database. Speaker-dependent monophone recognizers, based on hidden Markov models, were tested on paragraphs each lasting about 20 seconds. Results were evaluated at the phone level and tabulated by several classes (vowel, stop, and fricative). Measured articulator coordinates were transformed by principal components analysis, and velocity and acceleration were appended. Concatenating the transformed articulatory information to a standard acoustic (cepstral) representation reduced the error rate by 7.4%, demonstrating across-speaker statistical significance ( p = 0.018). Articulation improved recognition of male speakers more than female, and recognition of vowels more than fricatives or stops.; The analysis of vowels, stops, and fricatives included both the articulatory recognizer of chapter 3 and other recognizers for comparison. The information content of the different classes was also estimated. Previous assumptions about recognition performance are false, and findings of information content require consonants to be defined to include vowel-like sounds.
机译:本文研究了测得的和推断的运动学信息在自动语音识别和唇读中的应用,并研究了元音和辅音的相关信息内容和识别性能。运动学信息描述了言语器官-发音器的运动。本文的贡献包括一种新的设备和一套用于唇读的算法(它们的设计,构造,实现和测试);将直接发音器位置的测量值合并到语音识别器中;重新评估有关元音和辅音的一些假设。包含发音信息的动机是为了改进共发音的建模并调和多种用于唇读的输入方式。共发音是普遍存在的现象,是指通过前后声音对语音进行修改的过程。为了在实践中有用,识别器将必须从声音,视频或两者中推断出发音信息。先前的工作在从声音恢复发音方面取得了进展。本项目假定这种恢复是可能的;它检查了联合声音发音表示相对于仅声音表达的优势。还报道了一种从视频中恢复的方法,在该方法中,选择了摄像头放置(侧视图,头戴式)和照明以可靠地获取唇部运动信息。使用威斯康星大学的X射线微束语音产生数据库进行了联合的声音发音识别实验。基于隐马尔可夫模型的与说话者相关的单音识别器在每个持续约20秒的段落中进行了测试。在电话级别评估结果,并按几个类别(元音,停止音和摩擦音)列表。通过主成分分析来转换测得的咬合架坐标,并附加速度和加速度。将转换后的发音信息与标准声学(倒谱)表示形式相结合,可将错误率降低7.4%,从而证明了跨扬声器的统计意义( = 0.018)。清晰度比男性更好地提高了对男性说话者的识别,比摩擦或音调更提高了对元音的识别。元音,停止音和摩擦音的分析包括第3章的发音识别器和其他识别器以进行比较。还估计了不同类别的信息内容。先前关于识别性能的假设是错误的,并且信息内容的发现要求将辅音定义为包括元音般的声音。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号