...
首页> 外文期刊>Circuits, systems, and signal processing >Investigating Text-Independent Speaker Verification Systems Under Varied Data Conditions
【24h】

Investigating Text-Independent Speaker Verification Systems Under Varied Data Conditions

机译:在各种数据条件下调查与文本无关的说话者验证系统

获取原文
获取原文并翻译 | 示例

摘要

This work makes an investigation into speaker verification (SV) from the view of practical systems. Limited data SV is preferred in order to have user comfort and effective decision delivery for regular usage. However, reduction in speech data affects the SV performance that becomes a concern for field deployment. In this work, varied data conditions for SV are explored, and sufficient train with limited test data is presented as a preferable anatomy for practical systems. Different explorations are made from the perspective of improving performance in varied data conditions. These explorations include vocal tract constriction feature to include speaker-specific acoustic-phonetic information, different attributes of voice source features that carry alternative/complementary information from that carried by conventional mel-frequency cepstral coefficient features. Further, kernel discriminant analysis is performed at the back end of i-vector-based speaker modeling for channel/session compensation that is found to work well for varied data conditions. Finally, a framework is proposed in combination with the stated explorations to have a better speaker characterization, which is more effective in case of sufficient train and limited test data scenario. The proposed framework achieves significant improvement in performance [equal error rate (EER): 11.20%, detection cost function (DCF): 0.1990], compared to the baseline (EER: 22.31%, DCF: 0.4128) for sufficient train with 2-s test segment case, showing scope toward application-oriented systems.
机译:这项工作从实际系统的角度对说话人验证(SV)进行了研究。为了使用户舒适和有效地进行常规使用的决策交付,首选受限数据SV。但是,语音数据的减少会影响SV性能,这成为现场部署的一个问题。在这项工作中,探索了SV的各种数据条件,并提出了具有有限测试数据的足够训练作为实用系统的首选结构。从提高各种数据条件下的性能的角度出发,进行了不同的探索。这些探索包括声道收缩特征以包括说话者特定的声音信息,语音源特征的不同属性,这些特征携带了与常规梅尔频率倒谱系数特征所携带的替代/互补信息。此外,内核判别分析是在基于i矢量的扬声器建模的后端进行的,用于通道/会话补偿,可以很好地适用于各种数据条件。最后,结合提出的探索提出了一个框架,以具有更好的说话者特征,在足够的训练和有限的测试数据场景的情况下,该框架更为有效。相对于基线(EER:22.31%,DCF:0.4128),对于具有2秒钟的足够列车,所提出的框架在性能上得到了显着改善[平均错误率(EER):11.20%,检测成本函数(DCF):0.1990]。测试段案例,显示了面向应用程序系统的范围。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号