首页> 外文会议>WSEAS International Conference on Circuits, Systems, Signal and Telecommunications >PSO Based Optimized Reliability for Robust Multimodal Speaker Identification
【24h】

PSO Based Optimized Reliability for Robust Multimodal Speaker Identification

机译:PSO基于鲁棒多模扬声器识别的优化可靠性

获取原文

摘要

Speaker recognition in real environment with reliable mode is a key challenge for ubiquitous service in human computer interface. In this paper, we present a robust multimodal speaker identification system with optimized reliability of different modalities. We propose an extension of modified convection function's optimizing factors to account optimum reliability simultaneously in audio, face and lip information. The proposed reliability measure is applied to a multimodal speaker identification framework for robust speaker identification. Particle swarm optimization (PSO) algorithm has been employed to optimize the modified convection function's optimizing factors. In the face-based expert, the image quality has been degraded with jpeg compression technique in enrollment and test session. Similarly, Lip-based expert's image quality also degraded to create mismatch in enrollment and test image. Finally, an artificial illumination in opposite direction has been added to test face and lip image with different intensities, respectively. The VidTimit audio DB was collected in office environment has a high level of signal distortion. We have applied local principal component analysis (Local PCA) to both face and lip modalities for reducing the dimension of features vector. The overall speaker identification experiments are performed using VidTimit DB. Experimental results show that our proposed optimum reliability measures effectively enhanced the identification rate (IR) of 8.67% in comparison with the best classifier system i.e., audio classifier and most notably retained the consistency of multimodal integration framework.
机译:具有可靠模式的实际环境中的扬声器识别是人机界面中无处不在的关键挑战。在本文中,我们展示了一种强大的多模扬声器识别系统,具有不同方式的优化可靠性。我们提出了改进的对流功能的优化因素的扩展,以在音频,面部和唇部信息中同时考虑最佳可靠性。所提出的可靠性措施应用于用于强大扬声器识别的多模式扬声器识别框架。粒子群优化(PSO)算法已采用优化改进的对流功能的优化因子。在基于面部的专家中,图像质量已在注册和测试会话中使用JPEG压缩技术进行了降级。同样,基于唇部的专家的图像质量也降低了在注册和测试图像中产生不匹配。最后,已经添加了相反方向的人工照明以分别用不同强度测试面部和唇唇图像。在办公环境中收集Vidtimit音频DB具有高水平的信号失真。我们已经将本地主成分分析(本地PCA)应用于面部和唇缘模态,以减少特征向量的尺寸。使用Vidtimit DB进行整体扬声器识别实验。实验结果表明,与最佳分类器系统相比,我们所提出的最佳可靠性措施有效增强了8.67%的识别率(IR),而音频分类器和最符号保留多模式集成框架的一致性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号