首页> 外文OA文献 >SANAD: Single-label Arabic News Articles Dataset for automatic text categorization
【2h】

SANAD: Single-label Arabic News Articles Dataset for automatic text categorization

机译:Sanad:单标Arabic新闻文章数据集进行自动文本分类

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Text Classification is one of the most popular Natural Language Processing (NLP) tasks. Text classification (aka categorization) is an active research topic in recent years. However, much less attention was directed towards this task in Arabic, due to the lack of rich representative resources for training an Arabic text classifier. Therefore, we introduce a large Single-labeled Arabic News Articles Dataset (SANAD) of textual data collected from three news portals. The dataset is a large one consisting of almost 200k articles distributed into seven categories that we offer to the research community on Arabic computational linguistics. We anticipate that this rich dataset would make a great aid for a variety of NLP tasks on Modern Standard Arabic (MSA) textual data, especially for single label text classification purposes. We present the data in raw form. SANAD is composed of three main datasets scraped from three news portals, which are AlKhaleej, AlArabiya, and Akhbarona. SANAD is made public and freely available at https://data.mendeley.com/datasets/57zpx667y9. Keywords: Arabic, Natural language processing, News articles, Single-label text classification
机译:文本分类是最受欢迎的自然语言处理(NLP)任务之一。文本分类(AKA分类)是近年来一项积极的研究主题。然而,由于缺乏用于培训阿拉伯文文本分类器的丰富代表资源,缺乏富裕的代表资源,重视这项任务的注意力得多。因此,我们介绍了从三个新闻门户网站收集的大型单一标记的阿拉伯新闻文章数据集(Sanad)的文本数据。 DataSet是一个大型的,由近200k文章分发为七个类别,我们向研究界进行阿拉伯语计算语言学提供。我们预计这款丰富的数据集将对现代标准阿拉伯语(MSA)文本数据的各种NLP任务提供巨大援助,特别是对于单一标签文本分类目的。我们以原始形式呈现数据。 Sanad由三个新闻门户网站刮的三个主要数据集组成,这些数据集是Alkhaleej,Alarabiya和Akhbarona。 Sanad是公开的,并在https://data.mendeley.com/datasets/57zpx667y9上自由地提供。关键词:阿拉伯语,自然语言处理,新闻文章,单标文本分类

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号