首页> 外文期刊>Frontiers in Psychology >Visual speech discrimination and identification of natural and synthetic consonant stimuli
【24h】

Visual speech discrimination and identification of natural and synthetic consonant stimuli

机译:视觉语音识别以及自然和合成辅音刺激的识别

获取原文
       

摘要

From phonetic features to connected discourse, every level of psycholinguistic structure including prosody can be perceived through viewing the talking face. Yet a longstanding notion in the literature is that visual speech perceptual categories comprise groups of phonemes (referred to as visemes), such as /p, b, m/ and /f, v/, whose internal structure is not informative to the visual speech perceiver. This conclusion has not to our knowledge been evaluated using a psychophysical discrimination paradigm. We hypothesized that perceivers can discriminate the phonemes within typical viseme groups, and that discrimination measured with d-prime (d’) and response latency is related to visual stimulus dissimilarities between consonant segments. In Experiment 1, participants performed speeded discrimination for pairs of consonant-vowel spoken nonsense syllables that were predicted to be same, near, or far in their perceptual distances, and that were presented as natural or synthesized video. Near pairs were within-viseme consonants. Natural within-viseme stimulus pairs were discriminated significantly above chance (except for /k/-/h/). Sensitivity (d’) increased and response times decreased with distance. Discrimination and identification were superior with natural stimuli, which comprised more phonetic information. We suggest that the notion of the viseme as a unitary perceptual category is incorrect. Experiment 2 probed the perceptual basis for visual speech discrimination by inverting the stimuli. Overall reductions in d’ with inverted stimuli but a persistent pattern of larger d’ for far than for near stimulus pairs are interpreted as evidence that visual speech is represented by both its motion and configural attributes. The methods and results of this investigation open up avenues for understanding the neural and perceptual bases for visual and audiovisual speech perception and for development of practical applications such as visual lipreading/speechreading speech synthesis.
机译:从语音功能到关联的话语,可以通过查看说话的面孔来感知心理语言结构的各个层次,包括韵律。然而,文献中长期存在的观念是,视觉语音的感知类别包括音素组(称为音位),例如/ p,b,m /和/ f,v /,其内部结构对视觉语音没有帮助。感知器。据我们所知,该结论尚未使用心理物理歧视范例进行评估。我们假设感知者可以区分典型的韵位组中的音素,并且以d-素数(d’)和响应潜伏期测得的辨别力与辅音段之间的视觉刺激差异有关。在实验1中,参与者对感知语音距离相同,相近或相距遥远且以自然或合成视频形式呈现的辅音元音无意义音节进行快速判别。近对是语音内辅音。明显高于机会的情况下,区分了自然的视觉内刺激对(/ k /-/ h /除外)。灵敏度(d’)随距离增加而响应时间减少。歧视和识别优于自然刺激,其中包含更多的语音信息。我们建议视位素为单一感知类别的概念是不正确的。实验2通过反转刺激探索了视觉语音辨别的感知基础。 d'的总体减少与反向刺激相反,但持续存在的d'远距离比近距离刺激对更大的持续模式被解释为视觉语音由其运动和结构属性表示的证据。这项研究的方法和结果为理解视觉和视听语音感知的神经和知觉基础以及实际应用的发展开辟了途径,例如视觉唇读/语音朗读语音合成。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号