【24h】

VOCAL TRACT NORMALIZATION BASED ON SPECTRAL WARPING

机译:基于光谱包裹的声带归一化

获取原文
获取原文并翻译 | 示例

摘要

Two techniques for speaker adaptation based on frequency scale modifications are described and evaluated. In one method, minimum mean square error matching is performed between a spectral template for each speaker to a "typical speaker" spectral template. One parameter, a warping factor, is used to control the spectral matching. In the second method, a neural network classifier is used to adjust the frequency warping factor for each speaker so as to maximize vowel classification performance for each speaker. A vowel classifier trained only with normalized female speech and tested only with normalized male speech, or vice versa, is nearly as accurate as when speaker genders are matched for training and testing, and the speech is not normalized. The improvement due to normalization is much smaller, if training and test data are matched. The normalization based on classification performance is superior to that based on minimizing mean square error.
机译:描述和评估了基于频率标度修改的两种说话人自适应技术。在一种方法中,在每个说话者的频谱模板与“典型说话者”频谱模板之间执行最小均方误差匹配。一个参数,即翘曲因子,用于控制光谱匹配。在第二种方法中,使用神经网络分类器来调整每个说话者的频率扭曲因子,以便最大化每个说话者的元音分类性能。仅使用标准化的女性语音训练并且仅使用标准化的男性语音进行测试的元音分类器,其准确度几乎与匹配说话者性别进行训练和测试且语音未进行标准化的准确度差不多。如果训练和测试数据匹配,则归一化带来的改进要小得多。基于分类性能的归一化优于基于最小均方误差的归一化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号