首页> 外文会议>IEEE International Conference on Acoustics, Speech, and Signal Processing >MULTI-VIEW CCA-BASED ACOUSTIC FEATURES FOR PHONETIC RECOGNITION ACROSS SPEAKERS AND DOMAINS
【24h】

MULTI-VIEW CCA-BASED ACOUSTIC FEATURES FOR PHONETIC RECOGNITION ACROSS SPEAKERS AND DOMAINS

机译:基于CCA的基于CCA的声学功能,用于讲话者和域的语音识别

获取原文

摘要

Canonical correlation analysis (CCA) and kernel CCA can be used for unsupervised learning of acoustic features when a second view (e.g., articulatory measurements) is available for some training data, and such projections have been used to improve phonetic frame classification. Here we study the behavior of CCA-based acoustic features on the task of phonetic recognition, and investigate to what extent they are speaker-independent or domain-independent. The acoustic features are learned using data drawn from the University of Wisconsin X-ray Microbeam Database (XRMB). The features are evaluated within and across speakers on XRMB data, as well as on out-of-domain TIMIT and MOCHA-TIMIT data. Experimental results show consistent improvement with the learned acoustic features over baseline MFCCs and PCA projections. In both speaker-dependent and cross-speaker experiments, phonetic error rates are improved by 4-9% absolute (10-23% relative) using CCA-based features over baseline MFCCs. In cross-domain phonetic recognition (training on XRMB and testing on MOCHA or TIMIT), the learned projections provide smaller improvements.
机译:当第二次视图(例如,铰接性测量)可用于某些训练数据时,规范相关性分析(CCA)和核CCA可用于对声学特征的无监督学习,并且已经用于改善语音帧分类的这种预测。在这里,我们研究了CCA的声学特征对语音识别任务的行为,并调查了他们独立于扬声器或域独立的程度。使用从威斯康星大学X射线Microbeam数据库(XRMB)绘制的数据学习了声学功能。这些功能在XRMB数据上的扬声器内部和跨扬声器中进行评估,以及域外跨域的跨域和Mocha-Timit数据。实验结果表明,通过基线MFCCS和PCA投影的学习声学特征表明了一致的改进。在扬声器相关和跨扬声器实验中,使用基于CCA的特征在基线MFCC上使用基于CCA的特征来提高4-9%的绝对(10-23%相对)的语音误差率。在跨域语音识别(在XRMB训练和Mocha或Timit测试)中,学习的预测提供了更小的改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号