首页> 外文会议>AIRS 2012 >Building, Profiling, Analysing and Publishing an Arabic News Corpus Based on Google News RSS Feeds

【24h】

Building, Profiling, Analysing and Publishing an Arabic News Corpus Based on Google News RSS Feeds

机译：基于Google News RSS饲料，建立，分析，分析和发布阿拉伯新闻语料库

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

The aim of this paper is to give a detailed and explicit design, composition and documentation of a new Arabic News Corpus (ArNeCo). We used RSS feeds from Google news as a big container of article titles, and crawled the web to extract the text. About 11,000 documents with more than 6 million words were tagged as belonging to one of 6 domains: Business, Entertainment, Health, Science-Technology, Sports, and World. Metadata has been added to the corpus as a whole and to each domain independently. The developed corpus, called ArNeCo, has been analysed to ensure that it has a considerable quality and quantity, and published on the Internet for research purposes. This article aims to help potential users of ArNeCo to understand the nature of the corpus and to do information retrieval research in many ways such as in the formulation of queries, justification of decisions taken or interpretation of results gained. Besides the corpus, this article presents a method for developing corpora that can keep track of recent natural language texts posted on the Internet by using RSS feeds.

机译：本文的目的是提供一个新的阿拉伯新闻语料库（Arneco）的详细和明确的设计，组成和文件。我们使用Google News的RSS Feed作为文章标题的大容器，并爬网以提取文本。大约11,000名具有超过600万字的文件被标记为属于6个域名：商业，娱乐，健康，科学技术，体育和世界之一。元数据已作为整个语料库添加到每个域名。已经分析了已发达的语料库，称为Arneco，以确保它具有相当大的质量和数量，并在互联网上发表以进行研究。本文旨在帮助Arneco的潜在用户了解语料库的性质，并以许多方式进行信息检索研究，例如在制定查询中，所取决于或解释所获得的结果的理由。除了语料库之外，本文介绍了一种开发Corpora的方法，可以通过使用RSS馈送来跟踪最近在互联网上发布的自然语言文本。

著录项

来源
《AIRS 2012》|2013年||共12页
会议地点
作者
Salha M. Alzahrani;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 025.0425;
关键词
Arabic corpus; RSS feeds; construction; profile; metadata; analysis; evaluation;

机译：阿拉伯语语料库;RSS饲料;建筑;概况;元数据;分析;评估;

相似文献

外文文献
中文文献
专利

1. Analysing headlines as a way of downsizing news corpora: Evidence from an Arabic-English comparable corpus of newspaper articles [J] . Haider Ahmad S., Hussein Riyad F. Literary & linguistic computing . 2020,第4期

机译：分析头条新闻作为缩小新闻学习的方式：来自阿拉伯语 - 英语的证据报纸文章
2. Google N-Gram Viewer does not Include Arabic Corpus! Towards N-Gram Viewer for Arabic Corpus [J] . Alsmadi Izzat, Zarour Mohammad The international arab journal of information technology . 2018,第5期

机译：Google N-Gram Viewer不包括阿拉伯语语料库！面向N-Gram阿拉伯语语料库查看器
3. An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization [J] . Amina Chouigui, Oussama Ben Khiroun, Bilel Elayeb Arabian Journal for Science and Engineering . 2021,第4期

机译：阿拉伯语多源新闻语料库：试验单凭证提取总结
4. Building, Profiling, Analysing and Publishing an Arabic News Corpus Based on Google News RSS Feeds [C] . Salha M. Alzahrani Asia information retrieval societies conference . 2013

机译：基于Google新闻RSS源构建，分析，分析和发布阿拉伯新闻语料库
5. Stative and Stativizing Constructions in Arabic News Reports: A Corpus-Based Study [D] . Mansouri, Aous. 2016

机译：基于语料库的阿拉伯新闻报道的结构化和稳定化构造
6. Automating Academic Literature Searches With RSS Feeds and Google Reader™ [O] . Erick M Dubuque 2012

机译：使用RSS Feed和Google Reader™自动进行学术文献搜索
7. Cross-cultural Comparative Analyses of Media Texts: A Corpus-based Study of Articles About Climate Change from Five English Newspapers [O] . 张爱真 2016

机译：媒体文本的跨文化比较分析：基于语料库的关于五种英文报纸上有关气候变化的文章的研究

Building, Profiling, Analysing and Publishing an Arabic News Corpus Based on Google News RSS Feeds

摘要

著录项

相似文献

相关主题

期刊订阅