首页> 外文会议>Workshop on NLP for similar languages, varieties and dialects >A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects
【24h】

A Character-level Convolutional Neural Network for Distinguishing Similar Languages and Dialects

机译:区分相似语言和方言的字符级卷积神经网络

获取原文

摘要

Discriminating between closely-related language varieties is considered a challenging and important task. This paper describes our submission to the DSL 2016 shared-task, which included two sub-tasks: one on discriminating similar languages and one on identifying Arabic dialects. We developed a character-level neural network for this task. Given a sequence of characters, our model embeds each character in vector space, runs the sequence through multiple convolutions with different filter widths, and pools the convolutional representations to obtain a hidden vector representation of the text that is used for predicting the language or dialect. We primarily focused on the Arabic dialect identification task and obtained an F1 score of 0.4834, ranking 6th out of 18 participants. We also analyze errors made by our system on the Arabic data in some detail, and point to challenges such an approach is faced with.~1
机译:区分密切相关的语言变体被认为是一项艰巨而重要的任务。本文介绍了我们提交给DSL 2016共享任务的过程,该任务包括两个子任务:一个任务是区分相似的语言,另一个任务是识别阿拉伯语。我们为此任务开发了字符级神经网络。给定一个字符序列,我们的模型将每个字符嵌入向量空间中,通过具有不同过滤器宽度的多个卷积运行该序列,并合并卷积表示,以获得用于预测语言或方言的文本的隐藏向量表示。我们主要专注于阿拉伯语方言识别任务,获得的F1分数为0.4834,在18位参与者中排名第六。我们还将详细分析系统在阿拉伯数据上产生的错误,并指出这种方法所面临的挑战。〜1

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号