A Transformer-based Model for Sentence-Level Chinese Mandarin Lipreading

机译：基于变压器的句子级汉语普通话朗读模型

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Lipreading is a task that converts silent speaker video into its speech content, which has practical value in many scenarios. However, most current lipreading research is based on English and the research on sentence-level Chinese lipreading is still insufficient. Therefore, we propose an end-to-end lipreading network for Chinese Mandarin. Different from the existing works, we first applied Transformer architecture in Chinese Mandarin lipreading, which integrates the self-attention mechanism and improves the performance of the language model. According to the characteristics of Mandarin, pinyin is introduced to assist the prediction of Chinese characters. In addition, we divided the pinyin dictionary into initials and finals instead of 26 simple English letters, which can be more suitable with Chinese speaking habits. Based on the above, a Cascade-Transformer based Chinese Lipreading Network(CTCH-LipNet in short) is proposed to map the talking video to the speech content. From the experiment results on the large-scale dataset, it has been demonstrated that our proposed approach can achieve better recognition performance than the state-of-the-art approach investigated.

机译：唇读是一项将无声扬声器视频转换为语音内容的任务，这在许多情况下具有实用价值。然而，目前大多数的唇读研究都是基于英语的，而句子水平的汉语唇读的研究仍然不足。因此，我们提出了一个针对汉语普通话的端到端唇读网络。与现有作品不同的是，我们首先将Transformer体系结构应用于汉语普通话口语学习，它整合了自我注意机制并提高了语言模型的性能。根据普通话的特点，引入拼音来辅助汉字的预测。此外，我们将拼音字典分为大写字母和大写字母，而不是26个简单的英语字母，这更适合中国人的汉语习惯。在此基础上，提出了一种基于级联变压器的中文唇读网络（简称CTCH-LipNet），用于将有声视频映射到语音内容。从大规模数据集的实验结果可以证明，与所研究的最新方法相比，我们提出的方法可以实现更好的识别性能。

著录项

来源
《IEEE International Conference on Data Science in Cyberspace》|2020年|78-81|共4页
会议地点
作者
Shihui Ma; Shilin Wang; Xiang Lin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Chinese Mandarin lipreading; Transformer; Cascade Network;

机译：中文普通话读音;变压器;级联网络;

相似文献

外文文献
中文文献
专利

1. Modeling pitch contour of Chinese Mandarin sentences with the PENTA model [J] . Pang Hui Tsinghua Science and Technology . 2012,第2期

机译：用PENTA模型模拟汉语普通话的音调轮廓
2. Modeling pitch contour of Chinese Mandarin sentences with the PENTA model [J] . Pang Hui Tsinghua Science and Technology . 2012,第2期

机译：五角洲模型的中国普通话句子建模音调轮廓
3. Modeling Pitch Contour of Chinese Mandarin Sentences with the PENTA Model [J] . Hui Pang, Zhiyong Wu, Lianhong Cai 清华大学学报（英文版） . 2012,第002期

机译：用PENTA模型建模汉语普通话句的音高轮廓。
4. Discriminating between Mandarin Chinese and Swiss-German varieties using adaptive language models [C] . Tommi Jauhiainen, Heidi Jauhiainen, Krister Linden Annual conference of the North American Chapter of the Association for Computational Linguistics: human language technologies;Workshop on NLP for similar languages, varieties and dialects . 2019

机译：使用自适应语言模型区分普通话和瑞士德语
5. Phonological recoding in sentence-level Chinese character recognition by advanced adult L2 Chinese learners. [D] . Liu, Yeu-Ting. 2007

机译：成人二级汉语学习者在句子级汉字识别中的语音编码。
6. Mandarin Chinese Tone Identification in Cochlear Implants: Predictions from Acoustic Models [O] . Kenneth D. Morton Jr., Peter A. Torrione, Chandra S. Throckmorton, -1

机译：人工耳蜗中普通话语气识别：声学模型的预测。
7. UNIFYING AMPLITUDE AND PHASE ANALYSIS: A COMPOSITIONAL DATA APPROACH TO FUNCTIONAL MULTIVARIATE MIXED-EFFECTS MODELING OF MANDARIN CHINESE [O] . P. Z. Hadjipantelis, J. A. D. Aston, H. G. Müller, 2016

机译：统一幅度和相位分析：中国汉语功能多元混合效应模型的组合数据方法

A Transformer-based Model for Sentence-Level Chinese Mandarin Lipreading

摘要

著录项

相似文献

相关主题

期刊订阅