首页> 外文会议>IEEE International Conference on Semantic Computing >Named Entity Recognition on Arabic-English Code-Mixed Data
【24h】

Named Entity Recognition on Arabic-English Code-Mixed Data

机译:阿拉伯语-英语代码混合数据的命名实体识别

获取原文

摘要

As a result of globalization and better quality of education, a significant percentage of the population in Arab countries have become bilingual/multilingual. This has raised to the frequency of code-switching and code-mixing among Arabs in daily communication. Consequently, huge amount of Code-Mixed (CM) content can be found on different social media platforms. Such data could be analyzed and used in different Natural Language Processing (NLP) tasks to tackle the challenges emerging due to this multilingual phenomenon. Named Entity Recognition (NER) is one of the major tasks for several NLP systems. It is the process of identifying named entities in text. However, there is a lack of annotated CM data and resources for such task. This work aims at collecting and building the first annotated CM Arabic-English corpus for NER. Furthermore, we constructed a baseline NER system using deep neural networks and word embedding for Arabic-English CM text and enhanced it using a pooling technique.
机译:全球化和更好的教育质量的结果是,阿拉伯国家的很大一部分人口已经使用双语/多语种。这就增加了阿拉伯人在日常交流中进行代码切换和代码混合的频率。因此,可以在不同的社交媒体平台上找到大量的代码混合(CM)内容。可以分析此类数据并将其用于不同的自然语言处理(NLP)任务中,以解决由于这种多语言现象而出现的挑战。命名实体识别(NER)是几个NLP系统的主要任务之一。这是在文本中标识命名实体的过程。但是,缺少用于此类任务的带注释的CM数据和资源。这项工作旨在为NER收集和建立第一个带注释的CM阿拉伯语-英语语料库。此外,我们使用深度神经网络和阿拉伯语-英语CM文本的词嵌入功能构建了一个基线NER系统,并使用合并技术对其进行了增强。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号