首页> 外文期刊>IEEE Transactions on Speech and Audio Proceessing >Singing voice identification using spectral envelope estimation
【24h】

Singing voice identification using spectral envelope estimation

机译:使用频谱包络估计的演唱语音识别

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we present a spectrum-based system for singer identification that operates for the ideal case in which audio samples contain only the singer's voice. Our method begins with the computation of a robust estimate of the spectral envelope called the composite transfer function (CTF). The CTF is derived from the instantaneous amplitude and frequency of the sinusoidal partials which make up the vocal signal. Unlike traditional source-filter theory , the CTF does not explicitly separate the spectral characteristics of the vocal source and the vocal tract filter. The principal components of the CTFs are used as features for a quadratic classifier to identify singers. The approach is validated on a database containing samples from twelve classically trained singers. In cross validation experiments, test set accuracies of approximately 95% are found for a baseline case. The classifier's performance is not degraded when different vowels are included in classifier training and evaluation. Restricting the frequency range of the CTFs and using a test set containing samples extracted from solo performances of Italian arias reduces the test set accuracy to 70-80%.
机译:在本文中,我们提出了一种基于频谱的歌手识别系统,该系统适用于音频样本仅包含歌手语音的理想情况。我们的方法始于对光谱包络的可靠估计的计算,即复合传递函数(CTF)。 CTF是从构成声音信号的正弦部分的瞬时幅度和频率得出的。与传统的源滤波器理论不同,CTF并未明确区分人声源和声道滤波器的频谱特征。 CTF的主要成分用作二次分类器的特征,以识别歌手。该方法在包含来自十二名受过经典训练的歌手的样本的数据库中进行了验证。在交叉验证实验中,发现基准案例的测试集准确性约为95%。当在分类器训练和评估中包含不同的元音时,分类器的性能不会降低。限制CTF的频率范围,并使用包含从意大利咏叹调的独奏表演中提取的样本的测试仪,会将测试仪的准确度降低至70-80%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号