Building a Corpus for the Zaza-Gorani Language Family

机译：为zaza-gorani语言家庭构建一个语料库

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Thanks to the growth of local communities and various news websites along with the increasing accessibility of the Web, some of the endangered and less-resourced languages have a chance to revive in the information era. Therefore, the Web is considered a huge resource that can be used to extract language corpora which enable researchers to carry out various studies in linguistics and language technology. The Zaza-Gorani language family is a linguistic subgroup of the Northwestern Iranian languages for which there is no significant corpus available. Motivated to create one, in this paper we present our endeavour to collect a corpus in Zazaki and Gorani languages containing over 1.6M and 194k word tokens, respectively. This corpus is publicly available.

机译：由于当地社区的增长以及各种新闻网站以及网络的不断增加，一些濒危和资源的语言有机会在信息时代复活。因此，网络被认为是一种巨大的资源，可以用于提取语言语言，使研究人员能够在语言学和语言技术中进行各种研究。 Zaza-Gorani语言家族是伊朗西北部语言的语言亚组，没有任何重要的语料库。有动力创建一个，在本文中，我们致力于在Zazaki和Gorani语言中收集一个包含超过1.6米和194K个单词令牌的语料库。该语料库是公开的。

著录项

来源
《Workshop on NLP for Similar Languages, Varieties and Dialects》|2020年|70-78|共9页
会议地点
作者
Sina Ahmadi;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. span style="font-size:11.0pt;font-family: "Times New Roman","serif";mso-fareast-font-family:"Times New Roman";mso-bidi-font-family: Mangal;mso-ansi-language:PL;mso-fareast-language:EN-US;mso-bidi-language:HI" lang="PL"Corrosion behaviour of span style="font-size:11.0pt;font-family:"Times New Roman","serif"; mso-fareast-font-family:"Times New Roman";mso-bidi-font-family:Mangal; mso-ansi-language:EN-GB;mso-fareast-language:EN-US;mso-bidi-language:HI" lang="EN-GB"Cusub24/subZnsub5/subAl alloy in a sodium tetraborate solution in the presence of 1-phenyl-5-mercaptotetrazole/span/span [J] . Antonijevic M M, Gardic V R, Gupta V K Indian journal of chemical technology . 2014,第5a6期

机译：style =“ font-size：11.0pt; font-family：” Times New Roman“，” serif“; mso-fareast-font-family：” Times New Roman“; mso-bidi-font-family：Mangal; mso-ansi-language：PL; mso-fareast-language：EN-US; mso-bidi-language：HI“ lang =” PL“> style =” font-size：11.0pt; font-family的腐蚀行为：“ Times New Roman”，“ serif”; mso-fareast-font-family：“ Times New Roman”; mso-bidi-font-family：Mangal; mso-ansi语言：EN-GB; mso-fareast-language ：EN-US; mso-bidi-language：HI“ lang =” EN-GB“> Cu _{24 Zn _{5 铝合金在四硼酸钠溶液中存在1-苯基-5-巯基四唑}}
2. Towards building a Urdu Language Corpus using Common Crawl [J] . Shafiq Hafiz Muhammad, Tahir Bilal, Mehmood Muhammad Amir Journal of intelligent & fuzzy systems: Applications in Engineering and Technology . 2020,第2Pta2期

机译：使用常见爬网构建乌尔都语语言语料库
3. Building an annotated corpus for the Albanian language using bilingual projections and regular expressions [J] . Arbana Kadriu International Journal of Knowledge Engineering and Data Mining . 2019,第2期

机译：使用双语投影和正则表达式为阿尔巴尼亚语构建带注释的语料库
4. The Human Language Project: Building a Universal Corpus of the World's Languages [C] . Steven Abney, Steven Bird Annual meeting of the Association for Computational Linguistics;Meeting of the Association for Computational Linguistics . 2010

机译：人类语言项目：建立世界语言的通用语料库
5. Family influence on children's second language literacy building: A case study of Korean families. [D] . Han, Hak-Sun. 2007

机译：家庭对儿童第二语言素养建设的影响：以韩国家庭为例。
6. Best Practices for Building a Bimodal/Bilingual Child Language Corpus [O] . DEBORAH CHEN PICHLER, JULIE A. HOCHGESANG, DIANE LILLO-MARTIN, -1

机译：建立双峰/双语儿童语言语料库的最佳实践
7. Collaborative corpus building for minorized languages using wiki-technology. Documenting the Asturian language [O] . Larusson Johann, Saurí Roser, Viejo Xulio 2009

机译：使用维基技术为少数语言构建协作语料库。记录阿斯图里亚斯语言

Building a Corpus for the Zaza-Gorani Language Family

摘要

著录项

相似文献

相关主题

期刊订阅