首页> 外文期刊>IEEE transactions on audio, speech and language processing >Discriminative In-Set/Out-of-Set Speaker Recognition
【24h】

Discriminative In-Set/Out-of-Set Speaker Recognition

机译:区分性内置/外置说话人识别

获取原文
获取原文并翻译 | 示例

摘要

In this paper, the problem of identifying in-set versus out-of-set speakers for limited training/test data durations is addressed. The recognition objective is to form a decision regarding an input speaker as being a legitimate member of a set of enrolled speakers or outside speakers. The general goal is to perform rapid speaker model construction from limited enrollment and test size resources for in-set testing for input audio streams. In-set detection can help ensure security and proper access to private information, as well as detecting and tracking input speakers. Areas of applications of these concepts include rapid speaker tagging and tracking for information retrieval, communication networks, personal device assistants, and location access. We propose an integrated system with emphasis on short-enrollment data (about 5 s of speech for each enrolled speaker) and test data (2-8 s) within a text-independent mode. We present a simple and yet powerful decision rule to accept or reject speakers using a discriminative vector in the decision score space, together with statistical hypothesis testing based on the conventional likelihood ratio test. Discriminative training is introduced to further improve system performance for both decision techniques, by employing minimum classification error and minimum verification error frameworks. Experiments are performed using three separate corpora. Using the YOHO speaker recognition database, the alternative decision rule achieves measurable improvement over the likelihood ratio test, and discriminative training consistently enhances overall system performance with relative improvements ranging from 11.26%-28.68%. A further extended evaluation using the TIMIT (CORPUS1) and actual noisy aircraft communications data (CORPUS2) shows measurable improvement over the traditional MAP based scheme using the likelihood ratio test (MAP-LRT), with average EERs of 9%-23% for TIMIT and 13%-32% for noisy aircraft communications. The result- s confirm that an effective in-set/out-of-set speaker recognition system can be formulated using discriminative training for rapid tagging of input speakers from limited training and test data sizes
机译:在本文中,解决了在有限的训练/测试数据持续时间内确定内置扬声器和内置扬声器的问题。识别目标是就输入说话者是一组已注册说话者或外部说话者的合法成员做出决定。总体目标是从有限的注册和测试尺寸资源中进行快速的扬声器模型构建,以对输入音频流进行内置测试。内置检测可以帮助确保安全性和对私人信息的正确访问,以及检测和跟踪输入讲话者。这些概念的应用领域包括针对信息检索,通信网络,个人设备助手和位置访问的快速发言人标记和跟踪。我们提出了一个集成系统,该系统重点放在文本独立模式下的短期注册数据(每个注册演讲者大约5 s的语音)和测试数据(2-8 s)。我们提出一种简单而强大的决策规则,以在决策得分空间中使用判别向量来接受或拒绝说话者,以及基于常规似然比检验的统计假设检验。通过采用最小分类错误和最小验证错误框架,引入了判别训练以进一步提高两种决策技术的系统性能。使用三个独立的语料库进行实验。使用YOHO说话人识别数据库,替代决策规则在似然比测试上实现了可衡量的改进,而判别训练则在11.26%-28.68%的相对改进范围内不断提高了整个系统的性能。使用TIMIT(CORPUS1)和实际噪声飞机通信数据(CORPUS2)进行的进一步扩展评估显示,与使用似然比检验(MAP-LRT)的传统基于MAP的方案相比,TIMIT的平均EER为9%-23%和13%-32%用于嘈杂的飞机通信。结果证实,可以使用判别训练来制定有效的内置/失调说话人识别系统,从而从有限的训练和测试数据量中快速标记输入说话人

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号