首页> 外文OA文献 >A Joint Approach for Single-Channel Speaker Identification and Speech Separation
【2h】

A Joint Approach for Single-Channel Speaker Identification and Speech Separation

机译:单声道说话人识别和语音分离的联合方法

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this paper, we present a novel system for joint speaker identification and speech separation. For speaker identification a single-channel speaker identification algorithm is proposed which provides an estimate of signal-to-signal ratio (SSR) as a by-product. For speech separation, we propose a sinusoidal model-based algorithm. The speech separation algorithm consists of a double-talk/single-talk detector followed by a minimum mean square error estimator of sinusoidal parameters for finding optimal codevectors from pre-trained speaker codebooks. In evaluating the proposed system, we start from a situation where we have prior information of codebook indices, speaker identities and SSR-level, and then, by relaxing these assumptions one by one, we demonstrate the efficiency of the proposed fully blind system. In contrast to previous studies that mostly focus on automatic speech recognition (ASR) accuracy, here, we report the objective and subjective results as well. The results show that the proposed system performs as well as the best of the state-of-the-art in terms of perceived quality while its performance in terms of speaker identification and automatic speech recognition results are generally lower. It outperforms the state-of-the-art in terms of intelligibility showing that the ASR results are not conclusive. The proposed method achieves on average, 52.3% ASR accuracy, 41.2 points in MUSHRA and 85.9% in speech intelligibility.
机译:在本文中,我们提出了一种用于联合说话人识别和语音分离的新颖系统。对于说话人识别,提出了一种单通道说话人识别算法,该算法提供了作为副产品的信噪比(SSR)的估计值。对于语音分离,我们提出了一种基于正弦模型的算法。语音分离算法由双向通话/单通话检测器和正弦参数的最小均方误差估计器组成,用于从预训练的说话人代码簿中找到最佳代码向量。在评估拟议的系统时,我们首先要获得码本索引,说话人身份和SSR级别的先验信息,然后通过逐个放宽这些假设来证明拟议的全盲系统的效率。与以前主要关注自动语音识别(ASR)准确性的研究相反,在这里,我们还报告了客观和主观的结果。结果表明,所提出的系统在感知质量方面表现最佳,并且在说话者识别和自动语音识别结果方面的性能通常较低。在清晰度方面,它优于最新技术,表明ASR结果不是结论性的。所提出的方法平均可实现52.3%的ASR准确性,MUSHRA的41.2分和语音清晰度的85.9%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号