首页> 外文学位 >Accounting for the phonetic value of nonspeech sounds
【24h】

Accounting for the phonetic value of nonspeech sounds

机译:解释非语音声音的语音价值

获取原文
获取原文并翻译 | 示例

摘要

The nature of the process by which listeners parse auditory inputs into the phonetic percepts necessary for speech understanding is still only partially understood. Different theoretical stances frame the process as either the action of ordinary auditory processes or as the workings of a specialized speech perception system or module. Evidence that speech perception is special, at least on some level, can be found in perceptual phenomena that are associated with speech processing but not observed with other auditory stimuli. These include effects known to be related to top-down linguistic influence or even to the listener's parsing of the speaker's articulatory gestures.;There is mounting evidence, however, that these phenomena are not always restricted to speech stimuli: some nonspeech sounds, under certain presentation conditions, participate in these phonetic processes as well. These findings are enormously relevant to the theory of speech perception, as they suggest that a sharp speech/nonspeech dichotomy is untenable. Even more promising, they offer a way of reverse-engineering those aspects of speech perception that do not have a simple psychophysical explanation by observing how they react to stimuli that are carefully controlled, and may even be missing elements that are always present in speech. Experimental work that has attempted to do so are reviewed and discussed.;Original work extending these findings for two types of nonspeech stimuli is also presented. Under the first set of experiments, compensation for coarticulation is tested on a speech fricative target with a nonspeech context vowel (a synthesized glottal source with a single formant resonance). Results show that this nonspeech does induce a reliable context effect which cannot be due to auditory contrast. This effect is weaker than that induced by speech vowels, suggesting that listeners apply phonetic processing to a degree influenced by the plausibility of an acoustic event.;In the second set, listeners matched frequency-modulated tones to time-aligned visual CV syllables, in which rounding on the consonant and vowel varied independently. Results are consistent with those obtained in previous experiments with non-modulated tones: high tones are paired with high front vowel articulation, low tones with (back) rounded articulation. It is shown that this pitch-vowel correspondence is extensible to contexts that include spectrotemporal modulation at rates similar to speech. These findings are support for considering this effect to be a product of ordinary speech production rather than an unexplained idiosyncrasy in the auditory system.;The correspondences between nonspeech and speech sounds as reviewed and as noted in the above experiments were further evaluated on a spectral level. Much research has been done into modeling how listeners categorize speech spectra, and some of this research has identified certain cues as critical to phonetic categorization. Some of these models are further evaluated on nonspeech sounds: processing strategies that are indeed similar to human processing should predict the same phonetic categorizations, even on nonspeech, that human listeners perform. A comparison of full-spectrum versus formant-based models shows that the former much more accurately capture human judgments on the vowel quality of pure tones, and are also fairly effective at classifying formant-derived sine wave speech. Derived spectral measures, such as formants and cepstra are well tuned for speech but generally unable to imitate human performance on nonspeech.;All of these experiments support the notion that phonetic categorization for vowels and similar sounds operates by comparing spectral templates rather than highly derived spectral features such as formants. The observed correspondences between speech and nonspeech can be explained by spectral similarity, depending on both the presence and absence of spectral energy. More generally, the results support an inference-based understanding of speech perception in which listeners categorize based on maximizing the likelihood of an uttered phone given auditory input and scene analysis.
机译:聆听者将听觉输入解析为语音理解所必需的语音感知的过程的本质仍然只能部分理解。不同的理论立场将过程描述为普通听觉过程的作用,或者作为专门的语音感知系统或模块的工作。语音感知至少在某种程度上是特殊的,可以在与语音处理相关但在其他听觉刺激中未观察到的知觉现象中找到。这些影响包括已知与自上而下的语言影响甚至与听众对说话者的发音手势的解析有关的影响;然而,越来越多的证据表明,这些现象并不总是限于语音刺激:在某些情况下某些非语音声音陈述条件,也要参与这些语音过程。这些发现与语音感知理论极为相关,因为它们表明尖锐的语音/非语音二分法是站不住脚的。更令人鼓舞的是,它们提供了一种反向工程化语音感知方面的方法,这些方面没有简单的心理物理解释,方法是观察它们如何对经过精心控制的刺激做出反应,甚至可能丢失语音中始终存在的元素。审查和讨论尝试这样做的实验工作。;还提出了将这些发现扩展为两种非语音刺激的原始工作。在第一组实验中,使用非语音上下文元音(具有单个共振峰共振的合成声门声源)在语音摩擦目标上测试共发音补偿。结果表明,这种非言语的确会引起可靠的情境效果,这不可能是听觉上的对比所致。此效果比语音元音所引起的效果弱,表明听众在一定程度上受声学事件的合理性影响而对语音进行了语音处理。辅音和元音的舍入独立变化。结果与先前使用非调制音调的实验中获得的结果一致:高音调与高前元音发音配对,低音调与(后)圆角发音配对。可以看出,这种音调元音对应关系可以扩展到包括以类似于语音的速率进行时空调制的上下文。这些发现支持将这种效果视为普通语音产生的产物,而不是听觉系统中无法解释的特质。回顾并如上述实验所述,非语音和语音之间的对应关系在频谱水平上得到了进一步评估。 。在建模听众如何对语音频谱进行分类方面,已经进行了很多研究,并且其中一些研究已经确定了某些提示对于语音分类至关重要。这些模型中的一些模型还针对非语音声音进行了评估:确实类似于人类处理的处理策略应该预测人类听众执行的相同语音分类,即使是非语音也是如此。将全谱模型与基于共振峰的模型进行比较表明,前者可以更准确地捕获人类对纯音元音质量的判断,并且在对源自共振峰的正弦波语音进行分类时也非常有效。诸如共振峰和倒谱之类的派生频谱量度已针对语音进行了很好的调整,但通常无法模仿人类在非语音方面的表现;所有这些实验都支持这样一种观念,即通过比较频谱模板而不是高度派生的频谱来对元音和类似声音进行语音分类共振峰等特征。语音和非语音之间观察到的对应关系可以通过频谱相似性来解释,这取决于频谱能量的存在与否。更一般地,结果支持对语音感知的基于推理的理解,其中听众基于给定听觉输入和场景分析的最大发声电话的可能性进行分类。

著录项

  • 作者

    Finley, Gregory Peter.;

  • 作者单位

    University of California, Berkeley.;

  • 授予单位 University of California, Berkeley.;
  • 学科 Linguistics.;Psychology.
  • 学位 Ph.D.
  • 年度 2015
  • 页码 143 p.
  • 总页数 143
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号