While studying the traditional speech recognition system with audio-video dual mode , we found that the visual characteristics after image processing have the problems of large amount of data and important characteristics lost .Aiming at these problems , we plan to apply image sonification technology to extracting the characteristics of video image .By using BP neural network in genetic algorithm optimisation as the fusion model , we fuse the characteristics of audio and video at feature level .Experimental results show that , after being processed by the image sonification , the visual characteristics contain certain speech information , its recognition effect is stable in noise environment as well .The fusion model of neural network improves the robustness of the system .%在传统的视听双模态语音识别系统的研究中,经图像处理后的视觉特征往往具有数据量大、重要特征丢失等问题。针对这些问题,拟采用图像可听化技术对视频图像进行特征提取。以遗传算法优化的BP神经网络为融合模型,对视频、音频特征进行特征级融合。实验结果表明,经过图像可听化处理后视觉特征包含了一定的语音信息,在噪声环境下的识别效果比较稳定,神经网络的融合模型提高了系统的鲁棒性。
展开▼