首页> 外文期刊>The Journal of the Acoustical Society of America >A model of acoustic interspeaker variability based on the concept of formant–cavity affiliation
【24h】

A model of acoustic interspeaker variability based on the concept of formant–cavity affiliation

机译:基于共振峰-腔隶属关系概念的扬声器间声音差异性模型

获取原文
获取原文并翻译 | 示例
           

摘要

A method is proposed to model the interspeaker variability of formant patterns for oral vowels. It is assumed that this variability originates in the differences existing among speakers in the respective lengths of their front and back vocal-tract cavities. In order to characterize, from the spectral description of the acoustic speech signal, these vocal-tract differences between speakers, each formant is interpreted, according to the concept of formant–cavity affiliation, as a resonance of a specific vocal-tract cavity. Its frequency can thus be directly related to the corresponding cavity length, and a transformation model can be proposed from a speaker A to a speaker B on the basis of the frequency ratios of the formants corresponding to the same resonances. In order to minimize the number of sounds to be recorded for each speaker in order to carry out this speaker transformation, the frequency ratios are exactly computed only for the three extreme cardinal vowels [i, a, u] and they are approximated for the remaining vowels through an interpolation function. The method is evaluated through its capacity to transform the (F1,F2) formant patterns of eight oral vowels pronounced by five male speakers into the (F1,F2) patterns of the corresponding vowels generated by an articulatory model of the vocal tract. The resulting formant patterns are compared to those provided by normalization techniques published in the literature. The proposed method is found to be efficient, but a number of limitations are also observed and discussed. These limitations can be associated with the formant–cavity affiliation model itself or with a possible influence of speaker-specific vocal-tract geometry in the cross-sectional direction, which the model might not have taken into account.
机译:提出了一种方法来对口语元音的共振峰模式的说话者之间的差异进行建模。假定这种可变性源自说话者之间在其前后声道腔长度上存在的差异。为了从语音信号的频谱描述中表征扬声器之间的这些声道差异,根据共振峰-腔关联的概念,每个共振峰都被解释为特定声道腔的共振。因此,其频率可以直接与相应的腔体长度相关,并且可以基于对应于相同共振的共振峰的频率比,提出从说话者A到说话者B的变换模型。为了最小化每个扬声器要录制的声音数量以执行此扬声器转换,仅对三个极端基音元[i,a,u]精确计算出频率比,并针对剩余的近似元音进行近似估算通过插值功能的元音。通过将五位男性说话者发音的八个口腔元音的(F1,F2)共振峰模式转换成由声道发音模型生成的相应元音的(F1,F2)模式的能力来评估该方法。将所得共振峰图案与文献中归一化技术提供的共振峰图案进行比较。发现所提出的方法是有效的,但是也观察到并讨论了许多限制。这些限制可能与共振峰-腔关联模型本身有关,也可能与特定于说话人的声道几何形状在横截面方向上可能产生的影响有关,而该模型可能没有考虑在内。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号