首页> 外文会议>Pacific Asia Conference on Language, Information and Computation >Machine Translation Experiments on PADIC: A Parallel Arabic Dialect Corpus
【24h】

Machine Translation Experiments on PADIC: A Parallel Arabic Dialect Corpus

机译:机床翻译实验:平行阿拉伯语方言语料库

获取原文

摘要

We present in this paper PADIC, a Parallel Arabic Dialect Corpus we built from scratch, then we conducted experiments on cross-dialect Arabic machine translation. PADIC is composed of dialects from both the Maghreb and the Middle-East. Each dialect has been aligned with Modern Standard Arabic (MSA). Three dialects from Maghreb are concerned by this study: two from Algeria, one from Tunisia, and two dialects from the Middle-East (Syria and Palestine). PADIC has been built from scratch because the lack of dialect resources. In fact, Arabic dialects in Arab world in general are used in daily life conversations but they are not written. At the best of our knowledge, PADIC, up to now, is the largest corpus in the community working on dialects and especially those concerning Maghreb. PADIC is composed of 6400 sentences for each of the 5 concerned dialects and MSA. We conducted cross-lingual machine translation experiments between all the language pairs. For translating to MSA we interpolated the corresponding Language Model (LM) with a large Arabic corpus based LM. We also studied the impact of language model smoothing techniques on the results of machine translation because this corpus, even it is the largest one, it still very small in comparison to those used for translation of natural languages.
机译:我们在本文中展示了我们从头开始构建的并行阿拉伯语方言语料库,然后我们在跨方言阿拉伯机器翻译进行了实验。 Padic由来自马格勒布和中东的方言组成。每个方言都与现代标准阿拉伯语(MSA)一致。来自Maghreb的三条方言都受到这项研究的关注:来自阿尔及利亚的两个来自突尼斯的两种方言,以及来自中东(叙利亚和巴勒斯坦)的两条方言。 Padic是由划痕构建的,因为缺乏方言资源。事实上,阿拉伯世界的阿拉伯语方言一般用于日常生活谈话,但他们没有写入。在我们的知识中,Padic,截至目前,是社区中最大的语料库,尤其是涉及马格勒布的人。 Padic由6400​​个句子组成,每个有关方言和MSA中的每一个。我们在所有语言对之间进行了交叉机器翻译实验。用于将与MSA进行翻译,我们将相应的语言模型(LM)与基于大型阿拉伯语语料库的LM插值。我们还研究了语言模型平滑技术对机器翻译结果的影响,因为这种语料库,即使是最大的语料库,与用于翻译自然语言的人相比,它仍然很小。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号