CN-Celeb: A Challenging Chinese Speaker Recognition Dataset

机译：CN-CELEB：一个挑战的中国扬声器识别数据集

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Recently, researchers set an ambitious goal of conducting speaker recognition in unconstrained conditions where the variations on ambient, channel and emotion could be arbitrary. However, most publicly available datasets are collected under constrained environments, i.e., with little noise and limited channel variation. These datasets tend to deliver over-optimistic performance and do not meet the request of research on speaker recognition in unconstrained conditions. In this paper, we present CN-Celeb, a large-scale speaker recognition dataset collected `in the wild'. This dataset contains more than 130,000 utterances from 1,000 Chinese celebrities, and covers 11 different genres in real world. Experiments conducted with two state-of-the-art speaker recognition approaches (i-vector and x-vector) show that the performance on CN-Celeb is far inferior to the one obtained on Vox-Celeb, a widely used speaker recognition dataset. This result demonstrates that in real-life conditions, the performance of existing techniques might be much worse than it was thought. Our database is free for researchers and can be downloaded from http://project.cslt.org.

机译：最近，研究人员设定了一个雄心勃勃的目的，在不受约束的条件下进行扬声器识别，在那里环境，渠道和情绪的变化可能是任意的。然而，大多数公开可用的数据集在约束环境下收集，即，噪声和通道变化很小。这些数据集倾向于提供过度乐观的表现，并且不符合在不受约束条件下的扬声器识别研究的要求。在本文中，我们展示了CN-Celeb，一个大规模的扬声器识别数据集在野外收集了“野外”。该数据集包含来自1000名中国名人的130,000多种话语，并涵盖了现实世界中的11种不同类型。用两种最先进的扬声器识别方法（I-Vector和X-Vector）进行的实验表明，CN-CELEB上的性能远低于VOX-CELEB，广泛使用的扬声器识别数据集获得的性能。这结果表明，在现实生活条件下，现有技术的性能可能比想象的要差。我们的数据库免费用于研究人员，可以从http://project.cslt.org下载。

著录项

来源
《IEEE International Conference on Acoustics, Speech and Signal Processing》|2020年|p7444-8063|共5页
会议地点
作者
Y. Fan; J.W. Kang; L.T. Li; K.C. Li; H.L. Chen; S.T. Cheng; P.Y. Zhang; Z.Y. Zhou; Y.Q. Cai; D. Wang;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TN912-53;
关键词
speaker recognition; Chinese; dataset;

机译：扬声器识别;中文;数据集;

相似文献

外文文献
中文文献
专利

1. Speaker Non-speech Event Recognition with Standard Speech Datasets [J] . J. Rajnoha Acta polytechnica . 2007,第4a5期

机译：具有标准语音数据集的说话人非语音事件识别
2. Speaker Non-speech Event Recognition with Standard Speech Datasets [J] . J. Rajnoha Acta Polytechnica . 2007,第4a5期

机译：具有标准语音数据集的说话人非语音事件识别
3. LBP-based periocular recognition on challenging face datasets [J] . Gayathri Mahalingam, Karl Ricanek Jr EURASIP journal on image and video processing . 2013,第1期

机译：基于LBP的具有挑战性的人脸数据集的眼周识别
4. CN-Celeb: A Challenging Chinese Speaker Recognition Dataset [C] . Y. Fan, J.W. Kang, L.T. Li, IEEE International Conference on Acoustics, Speech and Signal Processing . 2020

机译：CN-Celeb：具有挑战性的中文说话者识别数据集
5. Face Recognition: Algorithmic Approach for Large Datasets and 3D Based Point Clouds [D] . ElSayed, Ahmed A. 2016

机译：人脸识别：大数据集和基于3D的点云的算法方法
6. Revisiting vocal perception in non-human animals: a review of vowel discrimination speaker voice recognition and speaker normalization [O] . Buddhamas Kriengwatana, Paola Escudero, Carel ten Cate 2014

机译：重温非人类动物的声音感知：元音辨别说话人语音识别和说话人正常化的综述
7. LBP-based periocular recognition on challenging face datasets [O] . 2013

机译：基于LBP的具有挑战性的人脸数据集的眼周识别
8. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. [R] . Hansen, J. H. 2015

机译：强大的语音处理和识别：说话者ID，语言ID，语音识别/关键字识别，Diarization / Co-Channel /环境表征，说话者状态评估。

CN-Celeb: A Challenging Chinese Speaker Recognition Dataset

摘要

著录项

相似文献

相关主题

期刊订阅