首页> 外文OA文献 >Articulatory-to-Acoustic Conversion Using BiLSTM-CNN Word-Attention-Based Method
【2h】

Articulatory-to-Acoustic Conversion Using BiLSTM-CNN Word-Attention-Based Method

机译:使用Bilstm-CNN字基关注方法的铰接到声学转换

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In the recent years, along with the development of artificial intelligence (AI) and man-machine interaction technology, speech recognition and production have been asked to adapt to the rapid development of AI and man-machine technology, which need to improve recognition accuracy through adding novel features, fusing the feature, and improving recognition methods. Aiming at developing novel recognition feature and application to speech recognition, this paper presents a new method for articulatory-to-acoustic conversion. In the study, we have converted articulatory features (i.e., velocities of tongue and motion of lips) into acoustic features (i.e., the second formant and Mel-Cepstra). By considering the graphical representation of the articulators’ motion, this study combined Bidirectional Long Short-Term Memory (BiLSTM) with convolution neural network (CNN) and adopted the idea of word attention in Mandarin to extract semantic features. In this paper, we used the electromagnetic articulography (EMA) database designed by Taiyuan University of Technology, which contains ten speakers’ 299 disyllables and sentences of Mandarin, and extracted 8-dimensional articulatory features and 1-dimensional semantic feature relying on the word-attention layer; we then trained 200 samples and tested 99 samples for the articulatory-to-acoustic conversion. Finally, Root Mean Square Error (RMSE), Mean Mel-Cepstral Distortion (MMCD), and correlation coefficient have been used to evaluate the conversion effect and for comparison with Gaussian Mixture Model (GMM) and BiLSTM of recurrent neural network (BiLSTM-RNN). The results illustrated that the MMCD of Mel-Frequency Cepstrum Coefficient (MFCC) was 1.467 dB, and the RMSE of F2 was 22.10 Hz. The research results of this study can be used in the features fusion and speech recognition to improve the accuracy of recognition.
机译:近年来,随着人工智能(AI)和人机互动技术的发展,语音识别和生产被要求适应AI和人机技术的快速发展,需要提高识别准确性添加新颖功能,融合功能,提高识别方法。旨在开发新颖的识别特征和应用于语音识别,提出了一种新方法,用于铰接到声学转换。在该研究中,我们将明晰度特征(即舌头,嘴唇的速度)转化为声学特征(即,第二甲苯胺和MEPSTRA)。通过考虑铰接器运动的图形表示,这项研究将双向长期短期记忆(BILSTM)与卷积神经网络(CNN)联合起来,并采用了普通话中注意力的思想,提取语义特征。在本文中,我们使用了太原理工大学设计的电磁关节(EMA)数据库,其中包含十个发言者的299个讲话和普通话句,并提取了依赖于单词的8维关节特征和1维语义特征 - 注意层;然后,我们训练了200个样本,并测试了99个样品,用于铰接到声学转换。最后,螺根均方误差(RMSE),平均膜 - 谱变形(MMCD),以及相关系数已被用于评估转换效果并与高斯混合模型(GMM)和经常性神经网络的BILSTM相比(Bilstm-RNN )。结果表明,熔融频率综糖系数(MFCC)的MMCD为1.467dB,F2的RMSE为22.10 Hz。该研究的研究结果可用于特征融合和语音识别,以提高识别的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号