首页> 外文会议>International Conference on Text, Speech and Dialogue >Automatic Preparation of Standard Arabic Phonetically Rich Written Corpora with Different Linguistic Units

【24h】

Automatic Preparation of Standard Arabic Phonetically Rich Written Corpora with Different Linguistic Units

机译：用不同的语言单位自动准备标准阿拉伯语音富集的书面集团

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Phonetically rich and balanced speech corpora are essential components in state-of-the-art automatic speech recognition (ASR) and text-to-speech (TTS) systems. The written form of speech corpora must be prepared carefully to represent the richness and balance in the linguistic content. There is a lack of this type of spoken and written corpora for Standard Arabic (SA), and the only one available was prepared manually by expert linguists and phoneticians. In this work, we address the task of automatic preparation of written corpora with rich linguistic units. Our work depends on a comprehensive statistical linguistic study of SA based on automatic phonetic transcription of texts with more than 5 million words. We prepared two written corpora: the first corpus contains all allophones in SA with at least 3 occurrences of each allophone and 17 occurences of each phoneme. The second corpus contains, in addition to all allophones, 90.72% of diphones in SA.

机译：语音富裕和平衡的语音语音是最先进的自动语音识别（ASR）和文本到语音（TTS）系统的重要组成部分。书面形式的言论表必须仔细编写，以代表语言内容中的丰富性和平衡。对于标准阿拉伯语（SA）缺乏这种类型的语言和书面基础，唯一可用的一个可用的人由专家语言学家和语音人手动准备。在这项工作中，我们解决了丰富的语言单位自动编制的自动编制的任务。我们的工作取决于基于超过500万字的文本的自动拼音转录SA的全面统计语言学研究。我们准备了两种书面的Corpora：第一个语料库包含SA中的所有AlloChoune，每个音箱的至少3个发生和17个发生。除了所有AlloOphones外，第二个语料库还包含90.72％的SA中的偶像。

著录项

来源
《International Conference on Text, Speech and Dialogue 》|2017年|520p|共9页
会议地点
作者
Fadi Sindran; Firas Mualla; Tino Haderlein; Khaled Daqrouq; Elmar Noth;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP391.1-53;
关键词
Phonetically rich; SA written corpora; Linguistic content; Allophones; Diphones;

机译：语音富裕;书面书面;语言内容;偶像;偶像;

相似文献

外文文献
中文文献
专利

1. Phonetically rich and balanced text and speech corpora for Arabic language [J] . Mohammad A. M. Abushariah, Raja N. Ainon, Roziati Zainuddin, Language Resources and Evaluation . 2012 ,第4期

机译：语音丰富且平衡的阿拉伯语文字和语音语料库
2. Arabic Speaker-Independent Continuous Automatic Speech Recognition Based on a Phonetically Rich and Balanced Speech Corpus [J] . Mohammad Abushariah, Raja Ainon, Roziati Zainuddin, The international arab journal of information technology . 2012 ,第1期

机译：基于语音丰富均衡的语料库的阿拉伯语独立于说话人的连续自动语音识别
3. A resource of errors written in Spanish by people with dyslexia and its linguistic, phonetic and visual analysis [J] . Rello Luz, Baeza-Yates Ricardo, Llisterri Joaquim Language Resources and Evaluation . 2017 ,第2期

机译：阅读障碍者用西班牙语编写的错误资源，及其语言，语音和视觉分析
4. Automatic Preparation of Standard Arabic Phonetically Rich Written Corpora with Different Linguistic Units [C] . Fadi Sindran, Firas Mualla, Tino Haderlein, International conference on text, speech and dialogue . 2017

机译：自动准备具有不同语言单元的标准阿拉伯语语音丰富的书面语料库
5. Attitudes toward standard and non-standard dialects in linguistically stigmatized and linguistically prestigious regions in the United States and Germany. [D] . Huttinger, Dorothea Evelyn. 2011

机译：在美国和德国，有语言污名化和享有声望的地区对标准和非标准方言的态度。
6. Comparative study of commercial 4-methylumbelliferyl-beta-D-glucuronide preparations with the Standard Methods membrane filtration fecal coliform test for the detection of Escherichia coli in water samples. [O] . D L Clark, B B Milner, M H Stewart, 1991

机译：商业化的4-甲基伞形酮基-β-D-葡糖醛酸苷制剂与标准方法膜过滤粪大肠菌群试验的比较研究用于检测水样中的大肠杆菌。
7. The effects of speakers' gender, age, and region on overall performance of Arabic automatic speech recognition systems using the phonetically rich and balanced Modern Standard Arabic speech corpus [O] . Sawalha M, Abu Shariah M 2013

机译：发言者的性别，年龄和地区对使用语音丰富和平衡的现代标准阿拉伯语言语料库的阿拉伯语自动语音识别系统整体表现的影响

Automatic Preparation of Standard Arabic Phonetically Rich Written Corpora with Different Linguistic Units

摘要

著录项

相似文献

相关主题

期刊订阅