首页> 外文会议>International Conference on Text, Speech and Dialogue >Automatic Preparation of Standard Arabic Phonetically Rich Written Corpora with Different Linguistic Units
【24h】

Automatic Preparation of Standard Arabic Phonetically Rich Written Corpora with Different Linguistic Units

机译:用不同的语言单位自动准备标准阿拉伯语音富集的书面集团

获取原文

摘要

Phonetically rich and balanced speech corpora are essential components in state-of-the-art automatic speech recognition (ASR) and text-to-speech (TTS) systems. The written form of speech corpora must be prepared carefully to represent the richness and balance in the linguistic content. There is a lack of this type of spoken and written corpora for Standard Arabic (SA), and the only one available was prepared manually by expert linguists and phoneticians. In this work, we address the task of automatic preparation of written corpora with rich linguistic units. Our work depends on a comprehensive statistical linguistic study of SA based on automatic phonetic transcription of texts with more than 5 million words. We prepared two written corpora: the first corpus contains all allophones in SA with at least 3 occurrences of each allophone and 17 occurences of each phoneme. The second corpus contains, in addition to all allophones, 90.72% of diphones in SA.
机译:语音富裕和平衡的语音语音是最先进的自动语音识别(ASR)和文本到语音(TTS)系统的重要组成部分。书面形式的言论表必须仔细编写,以代表语言内容中的丰富性和平衡。对于标准阿拉伯语(SA)缺乏这种类型的语言和书面基础,唯一可用的一个可用的人由专家语言学家和语音人手动准备。在这项工作中,我们解决了丰富的语言单位自动编制的自动编制的任务。我们的工作取决于基于超过500万字的文本的自动拼音转录SA的全面统计语言学研究。我们准备了两种书面的Corpora:第一个语料库包含SA中的所有AlloChoune,每个音箱的至少3个发生和17个发生。除了所有AlloOphones外,第二个语料库还包含90.72%的SA中的偶像。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号