首页> 外文会议>International conference on natural language and speech processing >Automatic Diacritization as Prerequisite Towards the Automatic Generation of Arabic Lexical Recognition Tests
【24h】

Automatic Diacritization as Prerequisite Towards the Automatic Generation of Arabic Lexical Recognition Tests

机译:自动Diacritization是自动生成阿拉伯语词汇识别测试的前提

获取原文

摘要

The automatic generation of Arabic lexical recognition tests entails several NLP challenges, including corpus linguistics, automatic diacritization, lemmatization and language modeling. Here, we only address the problem of automatic diacritization, a step that paves the road for the automatic generation of Arabic LRTs. We conduct a comparative study between the available tools for diacritization (Farasa and Madamira) and a strong baseline. We evaluate the error rates for these systems using a set of publicly available (almost) fully diacritized corpora, but in a relaxed evaluation mode to ensure fair comparison. Farasa outperforms Madamira and the baseline under all conditions.
机译:自动生成阿拉伯语词汇识别测试会带来一些NLP挑战,包括语料库语言学,自动双歧化,词形化和语言建模。在这里,我们仅解决自动数字化的问题,这一步骤为自动生成阿拉伯语LRT铺平了道路。我们进行了比较分析,比较了可用的双眼畸形工具(Farasa和Madamira)和强基准。我们使用一组公开可用的(几乎)全双全语料库评估这些系统的错误率,但以宽松的评估模式来确保公平比较。在所有条件下,Farasa的表现均优于Madamira和基线。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号