首页> 外文会议>Workshop on Arabic Natural Language Processing >Embed More Ignore Less (EMIL): Exploiting Enriched Representations for Arabic NLP
【24h】

Embed More Ignore Less (EMIL): Exploiting Enriched Representations for Arabic NLP

机译:嵌入更多忽略(EMIL):利用阿拉伯语NLP的丰富的表示

获取原文

摘要

Our research focuses on the potential improvements of exploiting language specific characteristics in the form of embeddings by neural networks. More specifically,we investigate the capability of neural techniques and embeddings to represent language specific characteristics in two sequence labeling tasks: named entity recognition (NER) and part of speech (POS) tagging. In both tasks,our preprocessing is designed to use enriched Arabic representation by adding diacritics to undiacritized text. In POS tagging,we test the ability of a neural model to capture syntactic characteristics encoded within these diacritics by incorporating an embedding layer for diacritics alongside embedding layers for words and characters. In NER,our architecture incorporates diacritic and POS embeddings alongside word and character embeddings. Our experiments are conducted on 7 datasets (4 NER and 3 POS). We show that embedding the information that is encoded in automatically acquired Arabic diacritics improves the performance across all datasets on both tasks. Embedding the information in automatically assigned POS tags further improves performance on the NER task.
机译:我们的研究侧重于神经网络以嵌入形式利用语言特定特征的潜在改进。更具体地说,我们调查神经技术和嵌入的能力,以代表两个序列标记任务中的语言特定特征:命名实体识别(ner)和词语(POS)标记的一部分。在这两个任务中,我们的预处理旨在通过将富有折衷的文本添加到解体文本来使用丰富的阿拉伯语表示。在POS标记中,我们测试神经模型通过将嵌入层与嵌入层的嵌入层掺入单词和字符的嵌入层来测试神经模型捕获这些变形物中编码的句法特征。在NER中,我们的架构将梦中和POS嵌入的嵌入式和字符嵌入嵌入。我们的实验是在7个数据集(4个和3个POS)上进行的。我们表明,嵌入在自动获取的阿拉伯语变量中编码的信息可提高两个任务上所有数据集的性能。嵌入自动分配的POS标签中的信息进一步提高了NER任务的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号