首页> 外文会议>Second workshop on hybrid approaches to translation 2013 >Building bilingual lexicon to create Dialect Tunisian corpora and adapt language model
【24h】

Building bilingual lexicon to create Dialect Tunisian corpora and adapt language model

机译:构建双语词典以创建突尼斯方言语料库并调整语言模型

获取原文
获取原文并翻译 | 示例

摘要

Since the Tunisian revolution, Tunisian Dialect (TD) used in daily life, has became progressively used and represented in interviews, news and debate programs instead of Modern Standard Arabic (MSA). This situation has important negative consequences for natural language processing (NLP): since the spoken dialects are not officially written and do not have standard orthography, it is very costly to obtain adequate corpora to use for training NLP tools. Furthermore, there are almost no parallel corpora involving TD and MSA. In this paper, we describe the creation of Tunisian dialect text corpus as well as a method for building a bilingual dictionary, in order to create language model for speech recognition system for the Tunisian Broadcast News. So, we use explicit knowledge about the relation between TD and MSA.
机译:自突尼斯革命以来,日常生活中使用的突尼斯方言(TD)逐渐取代了现代标准阿拉伯语(MSA)而被使用,并在采访,新闻和辩论节目中得到了代表。这种情况对自然语言处理(NLP)产生了重要的负面影响:由于口语不是正式书写的,并且没有标准的拼字法,因此获得足够的语料库来训练NLP工具非常昂贵。此外,几乎没有涉及TD和MSA的平行语料库。在本文中,我们描述了突尼斯方言文本语料库的创建以及建立双语词典的方法,以为突尼斯广播新闻的语音识别系统创建语言模型。因此,我们使用有关TD和MSA之间关系的显式知识。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号