...
首页> 外文期刊>Computer speech and language >Character convolutions for Arabic Named Entity Recognition with Long Short-Term Memory Networks
【24h】

Character convolutions for Arabic Named Entity Recognition with Long Short-Term Memory Networks

机译:带有长短时记忆网络的阿拉伯命名实体识别的字符卷积

获取原文
获取原文并翻译 | 示例
           

摘要

Named Entity Recognition (NER) is a significant information extraction task since it is an important component of many natural language processing applications, such as Information Retrieval, Question Answering and Speech Recognition. The complexity and morphological richness of the Arabic language is the main reason why most existing Arabic NER systems rely strongly on hand-crafted feature engineering. In this paper, we propose to augment the existing LSTM neural tagging model for Arabic NER with a Convolutional Neural Network (CNN) for the extraction of relevant character-level features. By operating on the character-level, the proposed model is able to handle out-of-vocabulary words. Our results show that character CNN is able to outperform the previously used character-level Bi-directional Long Short-Term Memory Networks (BiLSTM) in many settings. Moreover, our observations indicate that CNNs tend to perform better than BiLSTM on relatively longer tokens. In addition, we conduct a comparison of four different pre-trained word vector models for Arabic NER and results show that a Skip-Gram Word2-vec model, pre-trained on a subset of the Arabic Gigaword corpus, is generally sufficient to obtain acceptable Arabic NER performance. (C) 2019 Published by Elsevier Ltd.
机译:命名实体识别(NER)是一项重要的信息提取任务,因为它是许多自然语言处理应用程序的重要组成部分,例如信息检索,问题回答和语音识别。阿拉伯语语言的复杂性和形态丰富性是大多数现有阿拉伯语NER系统强烈依赖手工制作的特征工程的主要原因。在本文中,我们建议使用卷积神经网络(CNN)扩展用于阿拉伯语NER的现有LSTM神经标记模型,以提取相关的字符级特征。通过在字符级别上进行操作,所提出的模型能够处理词汇外的单词。我们的结果表明,字符CNN在许多情况下都能够胜过以前使用的字符级双向长期短期存储网络(BiLSTM)。此外,我们的观察结果表明,在相对较长的令牌上,CNN的性能往往优于BiLSTM。此外,我们对阿拉伯语NER的四个不同的预训练词向量模型进行了比较,结果表明,在阿拉伯语Gigaword语料库的子集上进行预训练的Skip-Gram Word2-vec模型通常足以获得可接受的阿拉伯语NER性能。 (C)2019由Elsevier Ltd.发布

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号