首页> 外文会议>Workshop on Arabic Natural Language Processing >A Unified Model for Arabizi Detection and Transliteration using Sequence-to-Sequence Models
【24h】

A Unified Model for Arabizi Detection and Transliteration using Sequence-to-Sequence Models

机译:使用序列到序列模型的Arabizi检测和音译的统一模型

获取原文

摘要

While online Arabic is primarily written using the Arabic script,a Roman-script variety called Arabizi is often seen on social media. Although this representation captures the phonology of the language,it is not a one-to-one mapping with the Arabic script version. This issue is exacerbated by the fact that Arabizi on social media is Dialectal Arabic which does not have a standard orthography. Furthermore,Arabizi tends to include a lot of code mixing between Arabic and English (or French). To map Arabizi text to Arabic script in the context of complete utterances,previously published efforts have split Arabizi detection and Arabic script target in two separate tasks. In this paper,we present the first effort on a unified model for Arabizi detection and transliteration into a code-mixed output with consistent Arabic spelling conventions,using a sequence-to-sequence deep learning model. Our best system achieves 80.6% word accuracy and 58.7% BLEU on a blind test set.
机译:虽然在线阿拉伯语主要使用阿拉伯语脚本编写,但是在社交媒体上经常看到一个叫做Arabizi的罗马脚本。 虽然此表示捕获了语言的音韵,但它不是阿拉伯语脚本版本的一对一映射。 这个问题被社交媒体上的Arabizi是辩证阿拉伯语的事实加剧了,这是没有标准的拼写法。 此外,Arabizi倾向于包括阿拉伯语和英语(或法语)之间的大量代码混合。 将Arabizi文本在完整的话语的背景下将rapizi文本映射到阿拉伯语脚本,以前发表的努力在两个单独的任务中分开了Arabizi检测和阿拉伯语脚本目标。 在本文中,我们在使用序列到序列的深度学习模型中,向Arabizi检测和翻译成具有一致的阿拉伯语拼写约定的统一模型的第一次努力。 我们最好的系统在盲试验集上实现了80.6%的字准确度和58.7%的Bleu。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号