首页> 外文期刊>Perception & psychophysics >Similarity structure in visual speech perception and optical phonetic signals
【24h】

Similarity structure in visual speech perception and optical phonetic signals

机译:视觉语音感知和光学语音信号中的相似结构

获取原文
获取原文并翻译 | 示例
       

摘要

A complete understanding of visual phonetic perception (lipreading) requires linking perceptual effects to physical stimulus properties. However, the talking face is a highly complex stimulus, affording innumerable possible physical measurements. In the search for isomorphism between stimulus properties and phonetic effects, second-order isomorphism was examined between the perceptual similarities of video-recorded perceptually identified speech syllables and the physical similarities among the stimuli. Four talkers produced the stimulus syllables comprising 23 initial consonants followed by one of three vowels. Six normal-hearing participants identified the syllables in a visual-only condition. Perceptual stimulus dissimilarity was quantified using the Euclidean distances between stimuli in perceptual spaces obtained via multidimensional scaling. Physical stimulus dissimilarity was quantified using face points recorded in three dimensions by an optical motion capture system. The variance accounted for in the relationship between the perceptual and the physical dissimilarities was evaluated using both the raw dissimilarities and the weighted dissimilarities. With weighting and the full set of 3-D optical data, the variance accounted for ranged between 46% and 66% across talkers and between 49% and 64% across vowels. The robust second-order relationship between the sparse 3-D point representation of visible speech and the perceptual effects suggests that the 3-D point representation is a viable basis for controlled studies of first-order relationships between visual phonetic perception and physical stimulus attributes.
机译:对视觉语音感知(唇读)的完整理解需要将感知效果与物理刺激属性联系起来。但是,说话的脸是一个非常复杂的刺激,提供了无数可能的物理测量。在寻找刺激特性和语音效果之间的同构性时,检查了视频记录的感知识别语音音节的感知相似性与刺激之间的物理相似性之间的二阶同构性。四个说话者制作了包含23个初始辅音的刺激音节,然后是三个元音之一。六名听力正常的参与者在仅视觉条件下识别了音节。使用通过多维缩放获得的知觉空间中刺激之间的欧式距离来量化知觉刺激的不相似性。使用光学运动捕捉系统使用三维记录的面部点来量化物理刺激的不相似性。使用原始差异和加权差异来评估在感知差异和物理差异之间的关系中所占的差异。通过加权和全套3D光学数据,说话者之间的差异介于46%和66%之间,元音之间的差异介于49%和64%之间。可见语音的稀疏3D点表示与感知效果之间的稳固二阶关系表明,该3-D点表示是对可视语音感知和身体刺激属性之间的一阶关系进行受控研究的可行基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号