首页> 外文会议>Workshop on lexical and grammatical resources for language processing >Collaboratively Constructed Linguistic Resources for Language Variants and their Exploitation in NLP Applications - the case of Tunisian Arabic and the Social Media
【24h】

Collaboratively Constructed Linguistic Resources for Language Variants and their Exploitation in NLP Applications - the case of Tunisian Arabic and the Social Media

机译:合作构建了语言变体的语言资源及其在NLP应用中的开发 - 突尼斯阿拉伯语与社交媒体的案例

获取原文

摘要

Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dialects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially between MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for the translation of texts of social media. More precisely, this paper focuses on the Tunisian Dialect of Arabic (TAD) with an application on automatic machine translation for a social media text into MSA and any other target language. Linguistic tools such as a bilingual TAD-MSA lexicon and a set of grammatical mapping rules are collaboratively constructed and exploited in addition to a language model to produce MSA sentences of Tunisian dialectal sentences. This work is a first-step towards collaboratively constructed semantic and lexical resources for Arabic Social Media within the ASMAT (Arabic Social Media Analysis Tools) project.
机译:现代标准阿拉伯语(MSA)在大多数阿拉伯国家的正式语言。阿拉伯语方言(AD)或从MSA每日语言不同特别是在社交媒体通信。然而,大多数阿拉伯社会媒体文本具有混合形式,尤其是MSA和AD之间有许多差异。本文旨在通过对社会化媒体的文本的翻译提供了一个框架弥合MSA和AD之间的差距。更确切地说,本文重点研究阿拉伯语(TAD)的突尼斯方言与机器自动翻译的社交媒体文本MSA和任何其他目标语言的应用程序。语言工具,如双语TAD-MSA词汇和一套语法映射规则协同构建,另外一个语言模型利用来产生突尼斯方言句的句子MSA。这项工作是对协同构建语义和词汇资源ASMAT(阿拉伯语社交媒体分析工具)项目中的第一步阿拉伯语社会媒体。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号