首页> 外文期刊>The international arab journal of information technology >A Deep Learning Approach for the Romanized Tunisian Dialect Identification
【24h】

A Deep Learning Approach for the Romanized Tunisian Dialect Identification

机译:罗马化突尼斯方言识别的深度学习方法

获取原文
获取原文并翻译 | 示例
       

摘要

Language identification is an important task in natural language processing that consists of determining the language of a given text. It has increasingly picked the interest of researchers for the past few years, especially for code-switching informal textual content. This paper, focuses on the identification of the Romanized user-generated Tunisian dialect on the social web. Segmented and annotated a corpus extracted from social media and propose a deep learning approach for the identification task. A Bidirectional Long Short-Term Memory neural network with Conditional Random Fields decoding (BLSTM-CRF) had been used. For word embeddings, a combination of word-character BLSTM vector representation and Fast Text embeddings that takes into consideration character n-gram features. The overall accuracy obtained is 98.65%.
机译:语言识别是自然语言处理中的重要任务,包括确定给定文本的语言。它越来越多地利用了过去几年研究人员的兴趣,特别是对于代码切换非正式文本内容。本文侧重于识别社交网络上的罗马化用户生成的突尼斯方言。分段和注释从社交媒体提取的语料库,并提出了一种识别任务的深入学习方法。已经使用了具有条件随机字段解码(BLSTM-CRF)的双向长短期内存神经网络。对于Word Embeddings,Word-Character BLSTM矢量表示和快速文本嵌入的组合,用于考虑字符N-GRAM功能。所获得的总体准确性为98.65%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号