CN-Celeb: A Challenging Chinese Speaker Recognition Dataset

机译：CN-Celeb：具有挑战性的中文说话者识别数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, researchers set an ambitious goal of conducting speaker recognition in unconstrained conditions where the variations on ambient, channel and emotion could be arbitrary. However, most publicly available datasets are collected under constrained environments, i.e., with little noise and limited channel variation. These datasets tend to deliver over-optimistic performance and do not meet the request of research on speaker recognition in unconstrained conditions.In this paper, we present CN-Celeb, a large-scale speaker recognition dataset collected ‘in the wild’. This dataset contains more than 130,000 utterances from 1,000 Chinese celebrities, and covers 11 different genres in real world. Experiments conducted with two state-of-the-art speaker recognition approaches (i-vector and x-vector) show that the performance on CN-Celeb is far inferior to the one obtained on Vox-Celeb, a widely used speaker recognition dataset. This result demonstrates that in real-life conditions, the performance of existing techniques might be much worse than it was thought. Our database is free for researchers and can be downloaded from http://project.cslt.org.

机译：最近，研究人员设定了一个雄心勃勃的目标，即在不受限制的条件下进行说话人识别，在这种条件下，环境，通道和情感的变化可能是任意的。但是，大多数公开可用的数据集都是在受约束的环境下收集的，即噪声很小且通道变化有限。这些数据集往往表现出过分乐观的性能，无法满足在无限制条件下进行说话人识别研究的要求。在本文中，我们介绍了“野外”收集的大规模说话人识别数据集CN-Celeb。该数据集包含来自1000位中国名人的130,000多种语音，涵盖了现实世界中的11种不同流派。使用两种最先进的说话人识别方法（i矢量和x矢量）进行的实验表明，CN-Celeb的性能远不及广泛使用的说话人识别数据集Vox-Celeb所获得的性能。该结果表明，在现实生活中，现有技术的性能可能比想象的要差得多。我们的数据库对研究人员免费，可以从http://project.cslt.org下载。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2020年|7604-7608|共5页
会议地点
作者
Y. Fan; J.W. Kang; L.T. Li; K.C. Li; H.L. Chen; S.T. Cheng; P.Y. Zhang; Z.Y. Zhou; Y.Q. Cai; D. Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
speaker recognition; Chinese; dataset;

机译：说话人识别;中文;数据集;

相似文献

外文文献
中文文献
专利

1. Speaker Non-speech Event Recognition with Standard Speech Datasets [J] . J. Rajnoha Acta polytechnica . 2007,第4a5期

机译：具有标准语音数据集的说话人非语音事件识别
2. Speaker Non-speech Event Recognition with Standard Speech Datasets [J] . J. Rajnoha Acta Polytechnica . 2007,第4a5期

机译：具有标准语音数据集的说话人非语音事件识别
3. LBP-based periocular recognition on challenging face datasets [J] . Gayathri Mahalingam, Karl Ricanek Jr EURASIP journal on image and video processing . 2013,第1期

机译：基于LBP的具有挑战性的人脸数据集的眼周识别
4. CN-Celeb: A Challenging Chinese Speaker Recognition Dataset [C] . Y. Fan, J.W. Kang, L.T. Li, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：CN-CELEB：一个挑战的中国扬声器识别数据集
5. Face Recognition: Algorithmic Approach for Large Datasets and 3D Based Point Clouds [D] . ElSayed, Ahmed A. 2016

机译：人脸识别：大数据集和基于3D的点云的算法方法
6. Revisiting vocal perception in non-human animals: a review of vowel discrimination speaker voice recognition and speaker normalization [O] . Buddhamas Kriengwatana, Paola Escudero, Carel ten Cate 2014

机译：重温非人类动物的声音感知：元音辨别说话人语音识别和说话人正常化的综述
7. LBP-based periocular recognition on challenging face datasets [O] . 2013

机译：基于LBP的具有挑战性的人脸数据集的眼周识别
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

CN-Celeb: A Challenging Chinese Speaker Recognition Dataset

摘要

著录项

相似文献

相关主题

期刊订阅