...
首页> 外文期刊>Speech Communication >Multi-speaker articulatory trajectory formation based on speaker-independent articulatory HMMs
【24h】

Multi-speaker articulatory trajectory formation based on speaker-independent articulatory HMMs

机译:基于独立于说话者的关节HMM的多说话者关节轨迹形成

获取原文
获取原文并翻译 | 示例
           

摘要

Inter-speaker variability in the speech spectrum domain has been modeled using speaker-adaptive training (SAT), in which speaker-independent phoneme-specific hidden Markov models (HMMs) were used along with a speaker-adaptive matrix. In this paper, multi-speaker articulatory trajectory formation based on this method is presented. Both speaker-independent and speaker-specific features are statistically separated from a multi-speaker articulatory database, which consists of the mid-sagittal motion data of the lips, incisor, and tongue measured with an electro-magnetic articulographic (EMA) system. We evaluated the proposed method in terms of the RMS error between the measured and estimated articulatory parameters. When multi-speaker models of articulatory parameters with two speaker-adaptive matrices for each speaker were used, the average RMS error of articulatory parameters was 1.29 mm and showed no statistically significant difference from that for speaker-dependent models (1.22 mm). For comparison, multi-speaker models of the conventional speech spectrum were also constructed using a multi-speaker spectrum database, which consists of speech data simultaneously recorded during the articulatory measurements. The average spectral distance between the vocal-tract and estimated spectrum from two-matrix models was 4.19 dB and showed a statistically significant difference from that for speaker-dependent models (3.97 dB). These results indicate that modeling of inter-speaker variability in the articulatory parameter domain with a small number of matrices for each speaker almost perfectly approximates the speaker dependency of articulation and is better than that in the speech spectrum domain.
机译:已经使用说话者自适应训练(SAT)对语音频谱域中的说话者之间的可变性进行了建模,其中使用了与说话者无关的音素特有的隐马尔可夫模型(HMM)和说话者自适应矩阵。本文提出了一种基于这种方法的多扬声器发音轨迹的形成方法。独立于说话者的特征和特定于说话者的特征都从多说话者发音数据库中进行了统计分离,该数据库由使用电磁关节造影(EMA)系统测量的嘴唇,门齿和舌头的中矢状运动数据组成。我们根据测得的和估计的关节参数之间的RMS误差对提出的方法进行了评估。当使用每个说话者具有两个说话者自适应矩阵的发音参数的多说话者模型时,发音参数的平均RMS误差为1.29 mm,与说话者依赖模型(1.22 mm)相比,没有统计学上的显着差异。为了进行比较,还使用多扬声器频谱数据库构建了常规语音频谱的多扬声器模型,该数据库由在发音测量过程中同时记录的语音数据组成。两个矩阵模型的声道与估计频谱之间的平均频谱距离为4.19 dB,与说话者相关模型的平均频谱距离(3.97 dB)相比,具有统计上的显着差异。这些结果表明,在发音参数域中的说话者间可变性的建模,每个说话者的矩阵数量很少,几乎完美地近似了发音的说话者依赖性,并且比在语音频谱域中更好。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号