首页> 外文会议>IEEE International Conference on Acoustics, Speech and Signal Processing >CN-Celeb: A Challenging Chinese Speaker Recognition Dataset
【24h】

CN-Celeb: A Challenging Chinese Speaker Recognition Dataset

机译:CN-Celeb:具有挑战性的中文说话者识别数据集

获取原文

摘要

Recently, researchers set an ambitious goal of conducting speaker recognition in unconstrained conditions where the variations on ambient, channel and emotion could be arbitrary. However, most publicly available datasets are collected under constrained environments, i.e., with little noise and limited channel variation. These datasets tend to deliver over-optimistic performance and do not meet the request of research on speaker recognition in unconstrained conditions.In this paper, we present CN-Celeb, a large-scale speaker recognition dataset collected ‘in the wild’. This dataset contains more than 130,000 utterances from 1,000 Chinese celebrities, and covers 11 different genres in real world. Experiments conducted with two state-of-the-art speaker recognition approaches (i-vector and x-vector) show that the performance on CN-Celeb is far inferior to the one obtained on Vox-Celeb, a widely used speaker recognition dataset. This result demonstrates that in real-life conditions, the performance of existing techniques might be much worse than it was thought. Our database is free for researchers and can be downloaded from http://project.cslt.org.
机译:最近,研究人员设定了一个雄心勃勃的目标,即在不受限制的条件下进行说话人识别,在这种条件下,环境,通道和情感的变化可能是任意的。但是,大多数公开可用的数据集都是在受约束的环境下收集的,即噪声很小且通道变化有限。这些数据集往往表现出过分乐观的性能,无法满足在无限制条件下进行说话人识别研究的要求。在本文中,我们介绍了“野外”收集的大规模说话人识别数据集CN-Celeb。该数据集包含来自1000位中国名人的130,000多种语音,涵盖了现实世界中的11种不同流派。使用两种最先进的说话人识别方法(i矢量和x矢量)进行的实验表明,CN-Celeb的性能远不及广泛使用的说话人识别数据集Vox-Celeb所获得的性能。该结果表明,在现实生活中,现有技术的性能可能比想象的要差得多。我们的数据库对研究人员免费,可以从http://project.cslt.org下载。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号