Embed More Ignore Less (EMIL): Exploiting Enriched Representations for Arabic NLP

机译：嵌入更多忽略（EMIL）：利用阿拉伯语NLP的丰富的表示

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Our research focuses on the potential improvements of exploiting language specific characteristics in the form of embeddings by neural networks. More specifically,we investigate the capability of neural techniques and embeddings to represent language specific characteristics in two sequence labeling tasks: named entity recognition (NER) and part of speech (POS) tagging. In both tasks,our preprocessing is designed to use enriched Arabic representation by adding diacritics to undiacritized text. In POS tagging,we test the ability of a neural model to capture syntactic characteristics encoded within these diacritics by incorporating an embedding layer for diacritics alongside embedding layers for words and characters. In NER,our architecture incorporates diacritic and POS embeddings alongside word and character embeddings. Our experiments are conducted on 7 datasets (4 NER and 3 POS). We show that embedding the information that is encoded in automatically acquired Arabic diacritics improves the performance across all datasets on both tasks. Embedding the information in automatically assigned POS tags further improves performance on the NER task.

机译：我们的研究侧重于神经网络以嵌入形式利用语言特定特征的潜在改进。更具体地说，我们调查神经技术和嵌入的能力，以代表两个序列标记任务中的语言特定特征：命名实体识别（ner）和词语（POS）标记的一部分。在这两个任务中，我们的预处理旨在通过将富有折衷的文本添加到解体文本来使用丰富的阿拉伯语表示。在POS标记中，我们测试神经模型通过将嵌入层与嵌入层的嵌入层掺入单词和字符的嵌入层来测试神经模型捕获这些变形物中编码的句法特征。在NER中，我们的架构将梦中和POS嵌入的嵌入式和字符嵌入嵌入。我们的实验是在7个数据集（4个和3个POS）上进行的。我们表明，嵌入在自动获取的阿拉伯语变量中编码的信息可提高两个任务上所有数据集的性能。嵌入自动分配的POS标签中的信息进一步提高了NER任务的性能。

著录项

来源
《Workshop on Arabic Natural Language Processing》|2020年|139-154|共16页
会议地点
作者
Ahmed Younes; Julie Weeds;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
入库时间 2022-08-26 13:58:09

相似文献

外文文献
中文文献
专利

1. AraVec: A set of Arabic Word Embedding Models for use in Arabic NLP [J] . Abu Bakr Soliman, Kareem Eissa, Samhaa R. El-Beltagy Procedia Computer Science . 2017,第11期

机译：AraVec：一组用于阿拉伯语NLP的阿拉伯语单词嵌入模型
2. Word Sense Disambiguation for Arabic Exploiting Arabic WordNet and Word Embedding [J] . Ali Alkhatlan, Jugal Kalita, Ahmed Alhaddad Procedia Computer Science . 2018,第22期

机译：阿拉伯语利用阿拉伯词网和词嵌入的词义消歧
3. Classifiers for Arabic NLP: survey [J] . Marwan Al Omari, Moustafa Al-Hajj International Journal of Computational Complexity and Intelligent Algorithms . 2020,第3期

机译：阿拉伯语NLP的分类器：调查
4. Collaboratively Constructed Linguistic Resources for Language Variants and their Exploitation in NLP Applications - the case of Tunisian Arabic and the Social Media [C] . Fatiha Sadat, Fatma Mallek, Rahma Sellami, Workshop on lexical and grammatical resources for language processing . 2014

机译：协作构建的语言变体语言资源及其在NLP应用中的开发-以突尼斯阿拉伯语和社交媒体为例
5. Exploiting knowledge in NLP [D] . Ratinov, Lev. 2012

机译：在NLP中利用知识
6. Methodology to Develop and Evaluate a Semantic Representation for NLP [O] . Jeannie Y. Irwin, Henk Harkema, Lee M. Christensen, 2009

机译：开发和评估NLP语义表示的方法
7. Sew-Embed at SemEval-2017 Task 2: Language-Independent Concept Representations from a Semantically Enriched Wikipedia [O] . Claudio Delli Bovi, Alessandro Raganato 2017

机译：缝制在Semeval-2017任务2：语言 - 独立于语义丰富的维基百科的概念表示

Embed More Ignore Less (EMIL): Exploiting Enriched Representations for Arabic NLP

摘要

著录项

相似文献

相关主题

期刊订阅