Speaker Recognition by Machines and Humans: A tutorial review

Hansen John H.L.; Hasan Taufiq

首页> 外文期刊>Signal Processing Magazine, IEEE >Speaker Recognition by Machines and Humans: A tutorial review

【24h】

Speaker Recognition by Machines and Humans: A tutorial review

机译：机器和人类对说话者的识别：教程复习

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Identifying a person by his or her voice is an important human trait most take for granted in natural human-to-human interaction/communication. Speaking to someone over the telephone usually begins by identifying who is speaking and, at least in cases of familiar speakers, a subjective verification by the listener that the identity is correct and the conversation can proceed. Automatic speaker-recognition systems have emerged as an important means of verifying identity in many e-commerce applications as well as in general business interactions, forensics, and law enforcement. Human experts trained in forensic speaker recognition can perform this task even better by examining a set of acoustic, prosodic, and linguistic characteristics of speech in a general approach referred to as structured listening. Techniques in forensic speaker recognition have been developed for many years by forensic speech scientists and linguists to help reduce any potential bias or preconceived understanding as to the validity of an unknown audio sample and a reference template from a potential suspect. Experienced researchers in signal processing and machine learning continue to develop automatic algorithms to effectively perform speaker recognition?with ever-improving performance?to the point where automatic systems start to perform on par with human listeners. In this article, we review the literature on speaker recognition by machines and humans, with an emphasis on prominent speaker-modeling techniques that have emerged in the last decade for automatic systems. We discuss different aspects of automatic systems, including voice-activity detection (VAD), features, speaker models, standard evaluation data sets, and performance metrics. Human speaker recognition is discussed in two parts?the first part involves forensic speaker-recognition methods, and the second illustrates how a na?ve listener performs this task from a neuroscience perspective. We conclude this review with a comparative- study of human versus machine speaker recognition and attempt to point out strengths and weaknesses of each.

机译：在人与人之间自然的互动/交流中，以人的声音识别一个人是最重要的人类特征。通过电话与某人交谈通常从识别谁在讲话开始，并且至少在熟悉讲话者的情况下，由听众进行主观验证，以确认身份正确并且可以进行对话。自动说话人识别系统已成为在许多电子商务应用程序以及一般业务交互，取证和执法中验证身份的重要手段。经过法医说话者识别培训的人类专家可以通过称为结构化聆听的一般方法检查语音的一组声学，韵律和语言特征来更好地完成此任务。法医语音科学家和语言学家已经开发了很多年的法医说话人识别技术，以帮助减少对未知音频样本和潜在嫌疑人的参考模板的有效性的任何潜在偏见或先入为主的理解。信号处理和机器学习方面经验丰富的研究人员继续开发自动算法，以有效地执行说话者识别（性能不断提高），以至于自动系统开始表现出与听众相同的水平。在本文中，我们回顾了有关机器和人对说话人识别的文献，重点介绍了过去十年来自动系统出现的杰出的说话人建模技术。我们讨论了自动系统的不同方面，包括语音活动检测（VAD），功能，扬声器模型，标准评估数据集和性能指标。说话人识别分为两个部分：第一部分涉及法医说话人识别方法，第二部分从神经科学的角度说明幼稚的听众如何执行此任务。我们通过对人与机器说话者识别的比较研究来结束本综述，并尝试指出两者的优缺点。

著录项

来源
《Signal Processing Magazine, IEEE》 |2015年第6期|74-99|共26页
作者
Hansen John H.L.; Hasan Taufiq;
展开▼
作者单位

CRSS: Center for Robust Speech Systems, University of Texas at Dallas, Richardson, Texas 75083-0688 United States;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Commentary: Revisiting vocal perception in non-human animals: a review of vowel discrimination, speaker voice recognition, and speaker normalization [J] . Linda Polka, Ocke-Schwen Bohn, Daniel J. Weiss Frontiers in Psychology . 2015,第4期

机译：评论：重新审视非人类动物的声音感知：元音辨别力，说话人语音识别和说话人正常化的综述
2. Revisiting vocal perception in non-human animals: a review of vowel discrimination, speaker voice recognition, and speaker normalization [J] . Buddhamas Kriengwatana, Paola Escudero, Carel ten Cate Frontiers in Psychology . 2014,第4期

机译：重温非人类动物的声音感知：元音辨别力，说话人语音识别和说话人正常化的综述
3. Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review [J] . Information Fusion . 2020,第期

机译：情感识别使用多模态数据和机器学习技术：教程和评论
4. HUMAN AND MACHINE SPEAKER RECOGNITION BASED ON SHORT TRIVIAL EVENTS [C] . Miao Zhang, Xiaofei Kang, Yanqing Wang, IEEE International Conference on Acoustics, Speech and Signal Processing . 2018

机译：基于短途琐碎事件的人员和机器扬声器识别
5. Hand gesture recognition system based in computer vision and machine learning: Applications on human-machine interaction [D] . Trigueiros, Paulo José de Albuquerque Cardoso. 2013

机译：基于计算机视觉和机器学习的手势识别系统：对人机交互的应用
6. Revisiting vocal perception in non-human animals: a review of vowel discrimination speaker voice recognition and speaker normalization [O] . Buddhamas Kriengwatana, Paola Escudero, Carel ten Cate 2014

机译：重温非人类动物的声音感知：元音辨别说话人语音识别和说话人正常化的综述
7. Revisiting vocal perception in non-human animals : a review of vowel discrimination, speaker voice recognition, and speaker normalization [O] . Kriengwatana Buddhamas, Escudero Paola, ten Cate Carel 2015

机译：重温非人类动物的声音感知：元音辨别，说话人语音识别和说话人正常化的综述

Speaker Recognition by Machines and Humans: A tutorial review

摘要

著录项

相似文献

相关主题

期刊订阅