首页> 外文会议>Workshop on language technology resources and tools for digital humanities >Shamela: A Large-Scale Historical Arabic Corpus

【24h】

Shamela: A Large-Scale Historical Arabic Corpus

机译：Shamela：一个大规模的历史阿拉伯语药物

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Arabic is a widely-spoken language with a rich and long history spanning more than fourteen centuries. Yet existing Arabic corpora largely focus on the modern period or lack sufficient di-achronic information. We develop a large-scale, historical corpus of Arabic of about 1 billion words from diverse periods of time. We clean this corpus, process it with a morphological analyzer, and enhance it by detecting parallel passages and automatically dating undated texts. We demonstrate its utility with selected case-studies in which we show its application to the digital humanities.

机译：阿拉伯语是一种广泛口语的语言，具有丰富而悠久的历史，跨越十四个世纪。然而，现有的阿拉伯数集团主要关注现代时期或缺乏足够的二逆转信息。我们开发了大约10亿个单词的大规模历史语料库，从各种时间段内到了大约10亿个单词。我们清洁此语料库，用形态分析仪处理它，通过检测并行通道并自动约会未定文本来增强它。我们展示了它的效用，其中包括所选案例研究，我们向数字人文学展示了其应用。

著录项

来源
《Workshop on language technology resources and tools for digital humanities 》|2016年|xii 195 p.|共9页
会议地点
作者
Yonatan Belinkov; Alexander Magidow; Maxim Romanov; Avi Shmidman; Moshe Koppel;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程 ;
关键词

相似文献

外文文献
中文文献
专利

1. Studying the history of the Arabic language: language technology and a large-scale historical corpus [J] . Belinkov Yonatan, Magidow Alexander, Barron-Cedeno Alberto, Language Resources and Evaluation . 2019 ,第4期

机译：研究阿拉伯语言的历史：语言技术和大规模历史语料库
2. Exploring and exploiting a historical corpus for Arabic [J] . Hammo Bassam, Yagi Sane, Ismail Omaima, Language Resources and Evaluation . 2016 ,第4期

机译：探索和利用阿拉伯语的历史语料库
3. A Large-Scale Arabic Sentiment Corpus Construction Using Online News Media [J] . Ahmed Nasser, Hayri Sever Journal of Engineering & Applied Sciences . 2018 ,第17期

机译：使用在线新闻媒体的大规模阿拉伯语情绪语料库施工
4. Shamela: A Large-Scale Historical Arabic Corpus [C] . Yonatan Belinkov, Alexander Magidow, Maxim Romanov, Language technology resources and tools for digital humanities . 2016

机译：Shamela：大型历史阿拉伯语语料库
5. The Impact of Ideology on Lexical Borrowing in Arabic: A Synergy of Corpus Linguistics and CDA [D] . Hamdi, Sami Abdullah. 2018

机译：意识形态对阿拉伯语词汇借款的影响：语料库语言学和CDA的协同作用
6. A7׳ta: Data on a monolingual Arabic parallel corpus for grammar checking [O] . Nora Madi, Hend S. Al-Khalifa 2019

机译：A7׳ta：单语阿拉伯语平行语料库中的数据用于语法检查
7. Studying the history of the Arabic language: language technology and a large-scale historical corpus [O] . Yonatan Belinkov, Alexander Magidow, Alberto Barrón-Cedeño, 2019

机译：研究阿拉伯语的历史：语言技术和大规模的历史语料库

Shamela: A Large-Scale Historical Arabic Corpus

摘要

著录项

相似文献

相关主题

期刊订阅