首页> 外文会议>International Conference on Artificial Intelligence and Soft Computing >Text Language Identification Using Attention-Based Recurrent Neural Networks
【24h】

Text Language Identification Using Attention-Based Recurrent Neural Networks

机译:使用基于注意力的递归神经网络进行文本语言识别

获取原文

摘要

The main purpose of this work is to explore the use of Attention-based Recurrent Neural Networks for text language identification. The most common, statistical language identification approaches are effective but need a long text to perform well. To address this problem, we propose the neural model based on the Long Short-Term Memory Neural Network augmented with the Attention Mechanism. The evaluation of the proposed method incorporates tests on texts written in disparate styles and tests on the Twitter posts corpus which comprises short and noisy texts. As a baseline, we apply a widely used statistical method based on a frequency of occurrences of n-grams. Additionally, we investigate the impact of an Attention Mechanism in the proposed method by comparing the results with the outcome of the model without an Attention Mechanism. As a result, the proposed model outperforms the baseline and achieves 97,98% accuracy on the test corpus covering 36 languages and keeps the accuracy also for the Twitter corpus achieving 91,6% accuracy.
机译:这项工作的主要目的是探索使用基于注意力的递归神经网络进行文本语言识别。最常用的统计语言识别方法是有效的,但需要较长的文本才能很好地执行。为了解决这个问题,我们提出了一种基于长短期记忆神经网络的神经模型,并增加了注意力机制。对提出的方法的评估包括对以不同样式编写的文本的测试,以及对包含简短和嘈杂文本的Twitter帖子语料库的测试。作为基准,我们基于n-gram的出现频率应用了广泛使用的统计方法。此外,我们通过将结果与没有注意机制的模型的结果进行比较,研究了注意机制在拟议方法中的影响。结果,所提出的模型优于基准,并且在涵盖36种语言的测试语料库上达到了97.98%的准确性,并保持Twitter语料库也达到了91.6%的准确性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号