首页> 外文会议>IEEE International Conference on Data Science in Cyberspace >A Transformer-based Model for Sentence-Level Chinese Mandarin Lipreading
【24h】

A Transformer-based Model for Sentence-Level Chinese Mandarin Lipreading

机译:基于变压器的句子级汉语普通话朗读模型

获取原文

摘要

Lipreading is a task that converts silent speaker video into its speech content, which has practical value in many scenarios. However, most current lipreading research is based on English and the research on sentence-level Chinese lipreading is still insufficient. Therefore, we propose an end-to-end lipreading network for Chinese Mandarin. Different from the existing works, we first applied Transformer architecture in Chinese Mandarin lipreading, which integrates the self-attention mechanism and improves the performance of the language model. According to the characteristics of Mandarin, pinyin is introduced to assist the prediction of Chinese characters. In addition, we divided the pinyin dictionary into initials and finals instead of 26 simple English letters, which can be more suitable with Chinese speaking habits. Based on the above, a Cascade-Transformer based Chinese Lipreading Network(CTCH-LipNet in short) is proposed to map the talking video to the speech content. From the experiment results on the large-scale dataset, it has been demonstrated that our proposed approach can achieve better recognition performance than the state-of-the-art approach investigated.
机译:唇读是一项将无声扬声器视频转换为语音内容的任务,这在许多情况下具有实用价值。然而,目前大多数的唇读研究都是基于英语的,而句子水平的汉语唇读的研究仍然不足。因此,我们提出了一个针对汉语普通话的端到端唇读网络。与现有作品不同的是,我们首先将Transformer体系结构应用于汉语普通话口语学习,它整合了自我注意机制并提高了语言模型的性能。根据普通话的特点,引入拼音来辅助汉字的预测。此外,我们将拼音字典分为大写字母和大写字母,而不是26个简单的英语字母,这更适合中国人的汉语习惯。在此基础上,提出了一种基于级联变压器的中文唇读网络(简称CTCH-LipNet),用于将有声视频映射到语音内容。从大规模数据集的实验结果可以证明,与所研究的最新方法相比,我们提出的方法可以实现更好的识别性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号