Building the Classical Arabic Named Entity Recognition Corpus (CANERCorpus)

机译：构建古典阿拉伯语命名实体识别语料库（Canercorpus）

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The past decade has witnessed construction of the background information resources to overcome several challenges in text mining tasks. For non-English languages with poor knowledge sources such as Arabic, these challenges have become more salient especially for handling the natural language processing applications that require human annotation. In the Named Entity Recognition (NER) task, several researches have been introduced to address the complexity of Arabic in terms of morphological and syntactical variations. However, there are a small number of studies dealing with Classical Arabic (CA) that is the official language of Quran and Hadith. CA was also used for archiving the Islamic topics that contain a lot of useful information which could of great value if extracted. Therefore, in this paper, we introduce Classical Arabic Named Entity Recognition corpus as a new corpus of tagged data that can be useful for handling the issues in recognition of Arabic named entities. It is freely available and manual annotation by human experts, containing more than 7,000 Hadiths. Based on Islamic topics, we classify named entities into 20 types which include the specific-domain entities that have not been handled before such as Allah, Prophet, Paradise, Hell, and Religion. The differences between the standard and classical Arabic are described in details during this work. Moreover, the comprehensive statistical analysis is introduced to measure the factors that play important role in manual human annotation.

机译：过去十年已经见证了背景信息资源的建设，以克服文本挖掘任务中的几个挑战。对于具有较差知识来源的非英语语言，如阿拉伯语，这些挑战变得更加突出，特别是处理需要人为注释的自然语言处理应用程序。在命名实体识别（NER）任务中，已经引入了几项研究以解决阿拉伯语在形态学和句法变异方面的复杂性。然而，有少数研究处理古典阿拉伯语（CA），即古兰经和圣训。 CA也用于归类伊斯兰主题，其中包含许多有用信息的有用信息，如果提取。因此，在本文中，我们将古典阿拉伯语命名实体识别语料库介绍为标记数据的新语料库，可以用于处理识别阿拉伯语命名实体的问题。它是由人类专家自由提供的，手动注释，含有超过7,000个圣训。基于伊斯兰主题，我们将命名实体分为20种类型，其中包括尚未以真主，先知，天堂，地狱和宗教等待处理的特定域实体。在这项工作期间详细描述了标准和古典阿拉伯语之间的差异。此外，引入了综合统计分析来衡量在手工人体注释中发挥重要作用的因素。

著录项

来源
《International Conference on Information Retrieval and Knowledge Management》|2018年|1 v.|共8页
会议地点
作者
Ramzi Esmail Salah; Lailatul Qadri Binti Zakaria;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
Information retrieval; Knowledge management;

机译：信息检索;知识管理;

相似文献

外文文献
中文文献
专利

1. BUILDING THE CLASSICAL ARABIC NAMED ENTITY RECOGNITION CORPUS (CANERCORPUS) [J] . RAMZI ESMAIL SALAH, LAILATUL QADRI BINTI ZAKARIA Journal of Theoretical and Applied Information Technology . 2018,第24期

机译：建立经典的阿拉伯命名实体识别公司（CANERCORPUS）
2. Myanmar named entity corpus and its use in syllable-based neural named entity recognition [J] . Hsu Myat Mo, Khin Mar Soe International Journal of Electrical and Computer Engineering . 2020,第2期

机译：缅甸名为实体语料库及其在基于音节的神经名为实体识别中的用途
3. New approach for Arabic named entity recognition on social media based on feature selection using genetic algorithm [J] . Brahim Ait Benali, Soukaina Mihi, Ismail El Bazi, International Journal of Electrical and Computer Engineering . 2021,第2期

机译：基于特征选择的阿拉伯语命名实体识别的新方法使用遗传算法
4. Building the Classical Arabic Named Entity Recognition Corpus (CANERCorpus) [C] . Ramzi Esmail Salah, Lailatul Qadri Binti Zakaria International Conference on Information Retrieval and Knowledge Management . 2018

机译：建立古典阿拉伯命名实体识别语料库（CANERCorpus）
5. Arabic Named Entity Recognition: A Corpus-Based Study [D] . Algahtani, Shabib. 2012

机译：阿拉伯语命名实体识别：基于语料库的研究
6. Constructing a Chinese electronic medical record corpus for named entity recognition on resident admit notes [O] . Yan Gao, Lei Gu, Yefeng Wang, 2019

机译：构建中国电子病历语料库以在居民录取通知书上命名实体
7. Classical Arabic Named Entity Recognition Using Variant Deep Neural Network Architectures and BERT [O] . Norah Alsaaran, Maha Alrabiah 2021

机译：使用变体深神经网络架构和伯特的古典阿拉伯语命名实体识别

Building the Classical Arabic Named Entity Recognition Corpus (CANERCorpus)

摘要

著录项

相似文献

相关主题

期刊订阅