首页> 外文期刊>Computer speech and language >Speaker verification based on the fusion of speech acoustics and inverted articulatory signals
【24h】

Speaker verification based on the fusion of speech acoustics and inverted articulatory signals

机译:基于语音声学和反向发音信号融合的说话人验证

获取原文
获取原文并翻译 | 示例

摘要

We propose a practical, feature-level and score-level fusion approach by combining acoustic and estimated articulatory information for both text independent and text dependent speaker verification. From a practical point of view, we study how to improve speaker verification performance by combining dynamic articulatory information with the conventional acoustic features. On text independent speaker verification, we find that concatenating articulatory features obtained from measured speech production data with conventional Mel-frequency cepstral coefficients (MFCCs) improves the performance dramatically. However, since directly measuring articulatory data is not feasible in many real world applications, we also experiment with estimated articulatory features obtained through acoustic-to-articulatory inversion. We explore both feature level and score level fusion methods and find that the overall system performance is significantly enhanced even with estimated articulatory features. Such a performance boost could be due to the inter-speaker variation information embedded in the estimated articulatory features. Since the dynamics of articulation contain important information, we included inverted articulatory trajectories in text dependent speaker verification. We demonstrate that the articulatory constraints introduced by inverted articulatory features help to reject wrong password trials and improve the performance after score level fusion. We evaluate the proposed methods on the X-ray Microbeam database and the RSR 2015 database, respectively, for the aforementioned two tasks. Experimental results show that we achieve more than 15% relative equal error rate reduction for both speaker verification tasks.
机译:我们提出了一种实用的,功能级别和得分级别的融合方法,通过结合声学和估计的发音信息来进行独立于文本和依赖于文本的说话者验证。从实践的角度来看,我们研究如何通过将动态发音信息与常规声学功能相结合来提高说话者验证性能。在独立于文本的说话人验证上,我们发现将从测得的语音产生数据获得的发音特征与常规的梅尔频率倒谱系数(MFCC)串联起来可以显着提高性能。但是,由于直接测量发音数据在许多实际应用中是不可行的,因此我们还尝试了通过声学-发音反演获得的估计发音特征。我们探索了特征级别和分数级别融合方法,发现即使估计了发音特征,整体系统性能也得到了显着增强。这样的性能提升可能是由于嵌入在估计的关节特征中的扬声器间变化信息所致。由于发音动态包含重要信息,因此我们在依赖文本的说话者验证中包括了反向发音轨迹。我们证明了由反向发音特征引入的发音约束有助于拒绝错误的密码尝试并提高分数级别融合后的性能。对于上述两个任务,我们分别在X射线微束数据库和RSR 2015数据库上评估了建议的方法。实验结果表明,对于两种说话人验证任务,我们都实现了15%以上的相对均等错误率降低。

著录项

  • 来源
    《Computer speech and language》 |2016年第3期|196-211|共16页
  • 作者单位

    Sun Yat-Sen University Carnegie Mellon University Joint Institute of Engineering, Sun Yat-Sen University, China,Sun Yat-Sen University Carnegie Mellon University Shunde International Joint Research Institute, Shunde, China,School of Mobile Information Engineering, Sun Yat-Sen University, China;

    Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, USA;

    Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, USA;

    Department of Electrical Engineering, Indian Institute of Science (IISc), Bangalore, India;

    Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, USA;

    Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, USA;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Text independent speaker verification; Text dependent speaker verification; Speech production; Articulatory features; Acoustic-to-articulatory inversion;

    机译:文本独立的说话人验证;文本相关的说话人验证;语音制作;发音特征;声音到发音的反转;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号