...
首页> 外文期刊>Signal Processing Magazine, IEEE >Speaker Recognition by Machines and Humans: A tutorial review
【24h】

Speaker Recognition by Machines and Humans: A tutorial review

机译:机器和人类对说话者的识别:教程复习

获取原文
获取原文并翻译 | 示例
           

摘要

Identifying a person by his or her voice is an important human trait most take for granted in natural human-to-human interaction/communication. Speaking to someone over the telephone usually begins by identifying who is speaking and, at least in cases of familiar speakers, a subjective verification by the listener that the identity is correct and the conversation can proceed. Automatic speaker-recognition systems have emerged as an important means of verifying identity in many e-commerce applications as well as in general business interactions, forensics, and law enforcement. Human experts trained in forensic speaker recognition can perform this task even better by examining a set of acoustic, prosodic, and linguistic characteristics of speech in a general approach referred to as structured listening. Techniques in forensic speaker recognition have been developed for many years by forensic speech scientists and linguists to help reduce any potential bias or preconceived understanding as to the validity of an unknown audio sample and a reference template from a potential suspect. Experienced researchers in signal processing and machine learning continue to develop automatic algorithms to effectively perform speaker recognition?with ever-improving performance?to the point where automatic systems start to perform on par with human listeners. In this article, we review the literature on speaker recognition by machines and humans, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems. We discuss different aspects of automatic systems, including voice-activity detection (VAD), features, speaker models, standard evaluation data sets, and performance metrics. Human speaker recognition is discussed in two parts?the first part involves forensic speaker-recognition methods, and the second illustrates how a na?ve listener performs this task from a neuroscience perspective. We conclude this review with a comparative- study of human versus machine speaker recognition and attempt to point out strengths and weaknesses of each.
机译:在人与人之间自然的互动/交流中,以人的声音识别一个人是最重要的人类特征。通过电话与某人交谈通常从识别谁在讲话开始,并且至少在熟悉讲话者的情况下,由听众进行主观验证,以确认身份正确并且可以进行对话。自动说话人识别系统已成为在许多电子商务应用程序以及一般业务交互,取证和执法中验证身份的重要手段。经过法医说话者识别培训的人类专家可以通过称为结构化聆听的一般方法检查语音的一组声学,韵律和语言特征来更好地完成此任务。法医语音科学家和语言学家已经开发了很多年的法医说话人识别技术,以帮助减少对未知音频样本和潜在嫌疑人的参考模板的有效性的任何潜在偏见或先入为主的理解。信号处理和机器学习方面经验丰富的研究人员继续开发自动算法,以有效地执行说话者识别(性能不断提高),以至于自动系统开始表现出与听众相同的水平。在本文中,我们回顾了有关机器和人对说话人识别的文献,重点介绍了过去十年来自动系统出现的杰出的说话人建模技术。我们讨论了自动系统的不同方面,包括语音活动检测(VAD),功能,扬声器模型,标准评估数据集和性能指标。说话人识别分为两个部分:第一部分涉及法医说话人识别方法,第二部分从神经科学的角度说明幼稚的听众如何执行此任务。我们通过对人与机器说话者识别的比较研究来结束本综述,并尝试指出两者的优缺点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号