首页> 外文会议>Workshop on lexical and grammatical resources for language processing >Collaboratively Constructed Linguistic Resources for Language Variants and their Exploitation in NLP Applications - the case of Tunisian Arabic and the Social Media
【24h】

Collaboratively Constructed Linguistic Resources for Language Variants and their Exploitation in NLP Applications - the case of Tunisian Arabic and the Social Media

机译:协作构建的语言变体语言资源及其在NLP应用中的开发-以突尼斯阿拉伯语和社交媒体为例

获取原文
获取原文并翻译 | 示例

摘要

Modern Standard Arabic (MSA) is the formal language in most Arabic countries. Arabic Dialects (AD) or daily language differs from MSA especially in social media communication. However, most Arabic social media texts have mixed forms and many variations especially between MSA and AD. This paper aims to bridge the gap between MSA and AD by providing a framework for the translation of texts of social media. More precisely, this paper focuses on the Tunisian Dialect of Arabic (TAD) with an application on automatic machine translation for a social media text into MSA and any other target language. Linguistic tools such as a bilingual TAD-MSA lexicon and a set of grammatical mapping rules are collaboratively constructed and exploited in addition to a language model to produce MSA sentences of Tunisian dialectal sentences. This work is a first-step towards collaboratively constructed semantic and lexical resources for Arabic Social Media within the ASMAT (Arabic Social Media Analysis Tools) project.
机译:现代标准阿拉伯语(MSA)是大多数阿拉伯国家/地区的正式语言。阿拉伯方言(AD)或日常语言与MSA有所不同,特别是在社交媒体交流中。但是,大多数阿拉伯语社交媒体文本的格式混合且变化很大,尤其是在MSA和AD之间。本文旨在通过提供社交媒体文本翻译的框架来弥合MSA与AD之间的鸿沟。更准确地说,本文重点介绍阿拉伯语的突尼斯方言(TAD),并将其用于将社交媒体文本自动转换为MSA和任何其他目标语言的应用程序。除了语言模型之外,还共同构建和利用了诸如双语TAD-MSA词典之类的语言工具和一套语法映射规则,以产生突尼斯方言句子的MSA句子。这项工作是朝着ASMAT(阿拉伯社会媒体分析工具)项目中的阿拉伯社会媒体协作构建语义和词汇资源的第一步。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号