您现在的位置: 首页> 研究主题> 说话人辨认

说话人辨认

说话人辨认的相关文献在1996年到2022年内共计80篇,主要集中在自动化技术、计算机技术、无线电电子学、电信技术、语言学 等领域,其中期刊论文64篇、会议论文9篇、专利文献258084篇;相关期刊43种,包括科学技术与工程、电子学报、通信学报等; 相关会议8种,包括第十一届全国人机语音通讯学术会议、2009年通信理论与信号处理学术年会、2007中国计算机大会等;说话人辨认的相关文献由158位作者贡献,包括王成儒、王金甲、孙林慧等。

说话人辨认—发文量

期刊论文>

论文:64 占比:0.02%

会议论文>

论文:9 占比:0.00%

专利文献>

论文:258084 占比:99.97%

总计:258157篇

说话人辨认—发文趋势图

说话人辨认

-研究学者

  • 王成儒
  • 王金甲
  • 孙林慧
  • 唐振民
  • 张玲华
  • 李燕萍
  • 李静
  • 杨震
  • 陈珂
  • 韩春光
  • 期刊论文
  • 会议论文
  • 专利文献

搜索

排序:

年份

    • 孙佳宁; 于玲
    • 摘要: 针对在智能音箱中容易出现误唤醒情况,即设备被环境音错误激活的问题,该文提出了一种多特征融合的说话人辨认算法。该算法在特征提取部分通过将短时能量、线性预测倒谱系数(LPCC)、梅尔频率倒谱系数(MFCC)及其一阶动态特征差分系数进行有机结合来提高说话人辨认算法的识别率。使用自建语音库进行仿真测试,仿真实验结果表明,与采用传统特征提取的GMM说话人辨认相比,采用改进的特征提取方法能显著提高说话人辨认的识别正确率。
    • 廖俊帆; 顾益军; 张培晶; 廖茜
    • 摘要: 为探究对抗样本对端到端说话人辨认系统的安全威胁与攻击效果,比较现有对抗样本生成算法在语音环境下的性能优劣势,分析FGSM、JSMA、BIM、C&W、PGD 5种白盒算法和ZOO、HSJA 2种黑盒算法.将7种对抗样本生成算法在ResCNN和GRU两种网络结构的端到端说话人辨认模型中实现有目标和无目标攻击,并制作音频对抗样本,通过攻击成功率和信噪比等性能指标评估攻击效果并进行人工隐蔽性测试.实验结果表明,现有对抗样本生成算法可在端到端说话人辨认模型中进行实现,白盒算法中的BIM、PGD具有较好的性能表现,黑盒算法的无目标攻击能达到白盒算法的攻击效果,但其有目标攻击性能有待进一步提升.
    • 赵艳; 吕亮; 赵力
    • 摘要: The technology of speaker identification will be used in many areas in the future. Firstly,a research is made on the use of two basic Deep Neural Network models which refer to Stacked Denoising-Autoencoders and Deep Belief Network on speaker identification. By pre-training layer-wisely without labels and back fine-tuning with labels,Deep Neural Network has overcome the shortcoming that is easy to fall into local minimum caused by back propagation. The experiments proves that Deep Network Model performs better than normal BP Network when the amount of neurons is bigger than certain number and its performance grows with the scale of Network enlarges. Considering the training time of large Deep Model is too long,this text proposes using Rectifier Linear Unit to replace traditional sigmoid function to improve deep model on speaker identification. The results of experiment show that the training time and error rate of improved deep model has decreased by 35% and 8.3% respectively.%说话人辨认技术在许多领域有着广泛的应用前景.首先研究了两种基本的深度神经网络模型(深度信念网络和降噪自编码)在说话人辨认上的应用,深度神经网络通过逐层无监督的预训练和有监督的反向微调避免了反向传播容易陷入局部最小值的缺陷,通过实验证明了当神经元个数达到一定数量之后深度网络模型是优于普通BP网络的,并且其性能随着网络规模的扩大而提升.考虑到大规模的深度网络训练时间较长的缺点,提出使用整流线性单元(ReLU)代替传统的sigmoid类函数对说话人识别的深度模型进行改进,实验结果表明改进后的深度模型平均训练时间减少了35%,平均误识率降低了8.3%.
    • 欧国振; 孙林慧; 薛海双
    • 摘要: 在传统的高斯混合模型-支持向量机(Gaussian Mixture Model-Support Vector Machine,GMM-SVM)说话人辨认系统中,SVM利用从GMM矢量空间中得到的超矢量(Super Vector)直接对说话人进行建模与识别,由于没有考虑到超矢量内各均值矢量之间的关联性,识别性能有限.为此,提出了基于重组超矢量构建文本无关的GMM-SVM说话人辨认系统.该系统充分利用各相邻高斯分量的均值矢量的高度关联性,保证了重组后的超矢量能充分反映说话人身份的内在细节,使得系统具有充分利用SVM处理高维小数据性能的优越特点.验证实验结果表明,与传统的GMM-SVM系统相比,重组超矢量GMM-SVM说话人辨认系统显著地缩短了系统建模的时间,同时有效地提高了说话人的辨别率.%In the traditional speaker identification system with Gaussian Mixture Model-Support Vector Machine (GMM-SVM),SVM uses super vector derived from the vector space of GMM to model and identify the target speakers directly.Since the relationship between two of mean vectors among GMM super vectors has not been considered,the performance of GMM-SVM system is limited.Thus a new text-independent GMM-SVM speaker identification system with super vector has been proposed which has made full use of tremendous correlation of each mean vector of the adjacent Gaussian components.The recombination super vectors have presented more inner detail of speakers' identity and enable the new system to take the advantage of the characteristics of superior performance when SVM deals with the small and high dimensional data.The experimental results demonstrate that the GMM-SVM speaker identification system with recombination super vector has not only achieved a higher recognition rate than the traditional GMM-SVM system,but also significantly decreased identification time of speakers.
    • 单燕燕
    • 摘要: 实验室环境下,说话人识别研究已经取得很大进展,但是在实际生活中,说话人识别系统的性能受到环境噪声、健康状况等因素的影响很大。日常生活中,感冒是不可避免的。而感冒往往会诱发鼻腔的炎症,改变鼻腔的容积和形状,引起说话人声音的改变,导致说话人识别性能下降。文中研究测试者感冒时说话人识别系统的性能。为了有效利用不同特征参数得分的互补性,针对基于 GMM 模型的说话人辨认系统,提出了将特征 LPC 和 MFCC 分别应用于该系统,并将二者的得分归一化后进行融合计算。实验结果表明,对正常语音来说,与 LPC 特征系统相比,该方法能够有效提升辨认性能;对感冒语音来说,当高斯成分为16时,较之 LPC 特征系统,该方法提升辨认性能12.5%左右,较之 MFCC 特征系统,该方法也能提升8.5%左右的辨认性能。%At present,speaker recognition technology has made great progress in clean voice. But in daily life,there are various factors, such as environmental noise and healthy condition,impacting recognition rate of speaker recognition system. The cold tends to induce the nasal cavity’s inflammation,and changes the volume and shape of the nasal cavity and then changes the vocal characteristics of the speak-er. In order to effectively use the complementarity of scores from different feature parameter,the performance’s change of speaker identi-fication system was studied when the speaker gets the cold. So the method was proposed using linear prediction coefficient and MEL ceps-trum coefficient to train the speaker model respectively,and then score normalization method is used to process scores from two feature systems. Finally,two outputs were weighted. The experimental results show that for normal speech,this method can improve the identifi-cation performance;for cold speech,the method improves the identification performance by 12. 5% when the number of Gaussian compo-nents equals to sixteen compared with the system taking MFCC as feature,by 8. 5% to the LPC system.
    • 李强; 彭益武
    • 摘要: 目前基于GMM的说话人辨认系统主要在微机上由软件实现,难以胜任大语音流的多路实时处理任务.鉴于FPGA强大的流水和并行处理能力,提出了一种以FPGA为应用平台,基于GMM的与文本无关说话人实时辨认系统的硬件实现方法.抽取NIST2003语音库的语料进行试验,结果表明,与在PC上的软件实现相比,识别率几无差别,但实时处理速度提高了约90倍.
    • 李荟; 赵云敏
    • 摘要: 针对现实中训练数据不足的特点,在说话人建模时采用高斯混合模型-通用背景模型(Gaussian Markov Model-Uniform Background Model, GMM-UBM),主要从说话人识别模型的自适应方法和参数估计方法两个方面,研究如何提高说话人识别系统的识别率。在说话人识别模型自适应方面,改进传统的用最大后验概率 MAP (Maximum A Posterior Probability)得到说话人模型的方法,将语音识别中的最大似然线性回归MLLR (Maximum Likelihood Linear Regression)和基于特征音(EigenVoice, EV)的自适应方法,应用到说话人识别模型自适应当中,并将其与MAP方法进行比较。%This thesis adopts GMM-UBM when model speaker recognition system considering of lacking data. In the aspect of adapting in speaker recognition system modeling and parameter estimating, attentions are put on researching in how to improve recognition rate. In the side of adapting in speaker recognition system modeling, we will ameliorate conventional MAP (Maximum A Posterior Probability) means to get speaker recognition model, apply MLLR (Maximum Likelihood Linear Regression) and EigenVoice adaptation ways which used in speech recognition into adapting in speaker recognition system modeling, and compare the results with MAP means.
    • 孟君; 杨大利
    • 摘要: 为更系统地讨论说话人辨认系统中UBM(universal background model)训练时长对系统识别性能的影响,针对UBM训练时长和混合度设置了一组实验,在基于GMM-UBM(gaussian mixture model-universal background model)的说话人辨认系统中,探讨了UBM训练时长和混合度之间的关系,得出了UBM平均每个混合得到100帧左右训练样本时,能够获得较高且较稳定识别率的结论,并总结出了在某一混合度下UBM训练数据净时长的范围,为以后的研究提供了一个基本的数据依据.
    • 蒋晔; 唐振民
    • 摘要: 针对短语音说话人辨认训练语料不充分的特点,对特征参数和GMM模型进行优化和改进,提出一种基于局部模糊PCA的GMM说话人辨认方法.该方法采用特征组合代替单一特征,以提高有效特征维数来弥补特征样本的不足,并用局部模糊PCA对组合特征进行有效降维,在对识别率影响很小的前提下,降低了系统的时空复杂度.本文还对GMM参数初始化方法进行改进,采用分裂法与模糊k均值聚类相结合方法.实验表明,与传统初始化方法相比该方法能有效提高短语音说话人辨认性能.%For the inadequate training speech data of speaker identification based on short utterance, feature vectors and GMM models are optimized and improved,an efficient GMM based on local PCA with fuzzy clustering is presented. To compensate for the limited feature samples, the effective feature dimensions are increased with featture combinations instead of single feature. Furthermore, the time and space conmplexity of the system can be compressed by reducing dimensions of feature combinations with local fuzzy PCA in the premise of little effect on recognition rate. Finally, a new approach which combines division and fuzzy kmeans clustering is used, in order to optimize GMM initialization parameters. The experiments show that the improved method is more effective in improving performance of the system than traditional initialization methods.
  • 查看更多

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号