首页> 外文会议>LREC-2012 >Open-Source Boundary-Annotated Corpus for Arabic Speech and Language Processing

【24h】

Open-Source Boundary-Annotated Corpus for Arabic Speech and Language Processing

机译：用于阿拉伯语语音和语言处理的开源边界注释语料库

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

A boundary-annotated and part-of-speech tagged corpus is a prerequisite for developing phrase break classifiers. Boundary annotations in English speech corpora are descriptive, delimiting intonation units perceived by the listener. We take a novel approach to phrase break prediction for Arabic, deriving our prosodic annotation scheme from Tajwīd (recitation) mark-up in the Qur'an which we then interpret as additional text-based data for computational analysis. This mark-up is prescriptive, and signifies a widely-used recitation style, and one of seven original styles of transmission. Here we report on version 1.0 of our Boundary-Annotated Qur'an dataset of 77430 words and 8230 sentences, where each word is tagged with prosodic and syntactic information at two coarse-grained levels. In (Sawalha et al., 2012), we use the dataset in phrase break prediction experiments. This research is part of a larger-scale project to produce annotation schemes, language resources, algorithms, and applications for Classical and Modem Standard Arabic.

机译：边界注释和致辞标记的语料库是开发短语中断分类器的先决条件。英语语音语料库中的边界注释是描述性的，划定了听众所感知的语调单位。我们采取了一种新的方法来为阿拉伯语进行短语预测，从古兰经中的Tajwīd（朗诵）标记的博物馆注释方案派生，然后我们将我们解释为基于额外的基于文本的数据进行计算分析。此标记是规范性的，并表示广泛使用的朗诵风格，以及七种原始传输风格之一。在这里，我们报告了我们的边界注释的Qur'An数据集的1.0版本为77430个单词和8230个句子，其中每个单词都以两个粗粒度的级别标记为博物馆和句法信息。在（Sawalha等人，2012）中，我们在短语中使用DataSet中断预测实验。该研究是大规模项目的一部分，用于生成古典和调制解调器标准阿拉伯语的注释方案，语言资源，算法和应用程序。

著录项

来源
《LREC-2012》|2012年||共6页
会议地点
作者
Claire Brierley; Majdi Sawalha; Eric Atwell;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 41.11083;
关键词
prosodic annotation; psycholinguistic chunking; phrase break prediction;

机译：韵律诠释;心理学块;短语断裂预测;

相似文献

外文文献
中文文献
专利

1. Adapting eSpeak to Arabic language: converting Arabic text to speech language using eSpeak [J] . Taha Zerrouki, Mohammed M. Abu Shquier, Amar Balla, International journal of reasoning-based intelligent systems . 2019,第1期

机译：使eSpeak适应阿拉伯语言：使用eSpeak将阿拉伯文本转换为语音语言
2. Note from the Guest Editors: Special issue on Arabic Natural Language Processing and Speech Recognition: A study of algorithms, resources, tools, techniques, and commercial applications [J] . Mohammad A. M. Abushariah, Amy Neustein, Bassam H. Hammo International journal of speech technology . 2016,第2期

机译：来宾编辑的注释：阿拉伯自然语言处理和语音识别特刊：对算法，资源，工具，技术和商业应用的研究
3. ARABIC TEXT TO SPEECH SYNTHESIS USING QURAN-BASED NATURAL LANGUAGE PROCESSING MODULE [J] . BANA ALSHARIF, RADWAN TAHBOUB, LABIB ARAFEH Journal of Theoretical and Applied Information Technology . 2016,第1期

机译：使用基于古兰经的自然语言处理模块来语音合成的阿拉伯文本
4. Open-Source Boundary-Annotated Corpus for Arabic Speech and Language Processing [C] . Claire Brierley, Majdi Sawalha, Eric Atwell International conference on language resources and evaluation . 2012

机译：用于阿拉伯语语音和语言处理的开源边界注释语料库
5. Corpus study of tense, aspect, and modality in diglossic speech in Cairene Arabic [D] . Moshref, Ola Ahmed 2012

机译：凯莱语阿拉伯语的高语语音中的时态，方面和情态的语料库研究
6. Correlating natural language processing and automated speech analysis with clinician assessment to quantify speech-language changes in mild cognitive impairment and Alzheimer’s dementia [O] . Anthony Yeung, Andrea Iaboni, Elizabeth Rochon, 2021

机译：与临床医生评估相关的自然语言处理和自动化语言分析以量化轻度认知障碍和阿尔茨海默痴呆症的语言变化
7. Arabic Language Processing for Text Classification. Contributions to Arabic Root Extraction Techniques, Building An Arabic Corpus, and to Arabic Text Classification Techniques. [O] . Al-Nashashibi May Yacoub Adib 2012

机译：用于文本分类的阿拉伯语言处理。对阿拉伯语根提取技术，建立阿拉伯语语料库和阿拉伯文本分类技术的贡献。
8. Transcription Scheme for Languages Employing the Arabic Script Motivated by Speech Processing Application [R] . Ganjavi, S. , Georgiou, P. G. , Narayanan, S. 2004

机译：采用语音处理应用激发阿拉伯语脚本的语言转录方案

Open-Source Boundary-Annotated Corpus for Arabic Speech and Language Processing

摘要

著录项

相似文献

相关主题

期刊订阅